Abstract 



A famous theorem of Szemeredi asserts that given any density < S < 
1 and any integer k > 3, any set of integers with density 5 will contain 
infinitely many proper arithmetic progressions of length k. For general 
k there are essentially four known proofs of this fact; Szemeredi's orig- 
inal combinatorial proof using the Szemeredi regularity lemma and van 
der Waerden's theorem, Furstenberg's proof using ergodic theory, Gowers' 
proof using Fourier analysis and the inverse theory of additive combina- 
torics, and the more recent proofs of Gowers and Rodl-Skokan using a 
hypergraph regularity lemma. Of these four, the ergodic theory proof 
is arguably the shortest, but also the least elementary, requiring in par- 
ticular the use of transfinite induction (and thus the axiom of choice), 
decomposing a general ergodic system as the weakly mixing extension 
of a transfinite tower of compact extensions. Here we present a quanti- 
tative, self-contained version of this ergodic theory proof, and which is 
"elementary" in the sense that it does not require the axiom of choice, 
the use of infinite sets or measures, or the use of the Fourier transform or 
inverse theorems from additive combinatorics. It also gives explicit (but 
extremely poor) quantitative bounds. 
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1 Introduction 

A famous theorem of van der Waerden [40] in 1927 states the following. 

Theorem 1.1 (Van der Waerden's theorem). [40] For any integers k,m> 
1 there exists an integer N = N v dw{k,m) > 1 such that every colouring c : 
{1,...,N} — ► {l,...,m} of{l,...,N} into m colours contains at least one 
monochromatic arithmetic progression of length k (i.e. a progression in {1, ... , N} 
of cardinality k on which c is constant). 

This theorem has by now several proofs; see [42] for a recent exposition of 
the original proof, as well as a proof of certain extensions of this theorem; for 
sake of completeness we present the original argument in an Appendix (§11) 
below. Another rather different proof can be found in [33] . This theorem was 
then generalized substantially in 1975 by Szemeredi [36] (building upon earlier 
work in [30], [35]), answering a question of Erdos and Turan [7], as follows: 

Theorem 1.2 (Szemeredi's theorem). For any integer k > 1 and real num- 
ber < 5 < 1, there exists an integer Nsz{k, S) > 1 such that for every 
N > Nsz(k,o~), every set A C {1,...,N} of cardinality \A\ > SN contains 
at least one arithmetic progression of length k. 

It is easy to deduce Van der Waerden's theorem from Szemeredi's theorem 
(with N v dw(k,m) ■— Nsz{k,-^)) by means of the pigeonhole principle. The 
converse implication however, is substantially less trivial. 

There are many proofs already known for Szemeredi's theorem, which we 
discuss below; the main purpose of this paper is present yet another such proof. 
This may seem somewhat redundant, but we will explain our motivation for 
providing another proof later in this introduction. 

Remarkably, while Szemeredi's theorem appears to be solely concerned with 
arithmetic combinatorics, it has spurred much further research in other areas 
such as graph theory, ergodic theory, Fourier analysis, and number theory; for 
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instance it was a key ingredient in the recent result [21] that the primes con- 
tain arbitrarily long arithmetic progressions. Despite the variety of proofs now 
available for this theorem, however, it is still regarded as a very difficult result, 
except when k is small. The cases k = 1 , 2 are trivial, and the case k = 3 is by 
now relatively well understood (see [30], [10], [32], [34], [5], [22], [6] for a variety 
of proofs). The case k = 4 also has a number of fairly straightforward proofs 
(see [35], [31], [18], [8]), although already the arguments here are more sophisti- 
cated than for the k — 3 case. However for the case of higher fc, only four types 
of proofs are currently known, all of which are rather deep. The original proof 
of Szemeredi [36] is highly combinatorial, relying on van der Waerden's theorem 
(Theorem 1.1) and the famous Szemeredi regularity lemma (which itself has 
found many other applications, see [24] for a survey); it does provide an upper 
bound on Nsz(k,5) but it is rather poor (of Ackcrmann type), due mainly to 
the reliance on the van der Waerden theorem and the regularity lemma, both 
of which have notoriously bad dependence of the constants. Shortly afterwards, 
Furstcnberg [9] (see also [14], [10]) introduced what appeared to be a completely 
different argument, transferring the problem into one of recurrence in ergodic 
theory, and solving that problem by a number of ergodic theory techniques, 
notably the introduction of a Furstcnberg tower (which is the analogue of the 
regularity lemma) . This ergodic theory argument is the shortest and most flex- 
ible of all the known proofs, and has been the most successful at leading to 
further generalizations of Szemeredi's theorem (see for instance [3], [4], [11], 
[12], [13]). On the other hand it uses the axiom of choice and as such does 
not provide any effective bounds for the quantity Nsz{k,S). The third proof 
is more recent, and is due to Gowers [19] (extending earlier arguments in [30], 
[18] for small k). It is based on combinatorics, Fourier analysis, and inverse 
arithmetic combinatorics (in particular multilinear versions of Freiman's theo- 
rem and the Balog-Szemeredi theorem). It gives far better bounds on N$z(k, 5) 
(essentially of double exponential growth in S rather than Ackermann or iterated 
tower growth), but also requires far more analytic machinery and quantitative 
estimates. Finally, very recent arguments of Gowers [20] and Rodl, Skokan, 
Nagle, Tengan, Tokushige, and Schacht [26], [27], [28], [25], relying primarily on 
a hypcrgraph version of the Szemeredi regularity lemma, have been discovered; 
these arguments are somewhat similar in spirit to Szemeredi's original proof (as 
well as the proofs in [32], [34] in the k = 3 case and [8] in the k = 4 case) but 
is conceptually somewhat more straightforward (once one accepts the need to 
work with hypergraphs instead of graphs, which does unfortunately introduce 
a number of additional technicalities). Also these arguments can handle certain 
higher dimensional extensions of Szemeredi's theorem first obtained by ergodic 
theory methods in [11]. 

As the above discussion shows, the known proofs of Szemeredi's theorem 
are extremely diverse. However, they do share a number of common themes, 
principal among which is the establishment of a dichotomy between randomness 
and structure. Indeed, in an extremely abstract and heuristic sense, one can de- 
scribe all the known proofs of Szemeredi's theorem collectively as follows. Start 
with the set A (or some other object which is a proxy for A, e.g. a graph, a hy- 
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pcrgraph, or a measure-preserving system). For the object under consideration, 
define some concept of randomness (e.g. e-regularity, uniformity, small Fourier 
coefficients, or weak mixing), and some concept of structure (e.g. a nested se- 
quence of arithmetically structured sets such as progressions or Bohr sets, or a 
partition of a vertex set into a controlled number of pieces, a collection of large 
Fourier coefficients, a sequence of almost periodic functions, a tower of com- 
pact extensions of the trivial tr-algebra, or a k — 2-step nilfactor) . Obtain some 
sort of structure theorem that splits the object into a structured component, 
plus an error which is random relative to that structured component. To prove 
Szemeredi's theorem (or a variant thereof), one then needs to obtain some sort 
of generalized von Neumann theorem to eliminate the random error, and then 
some sort of structured recurrence theorem for the structured component. 

Obviously there is a great deal of flexibility in executing the above abstract 
scheme, and this explains the large number of variations between the known 
proofs of Szemeredi type theorems. Also, each of the known proofs finds some 
parts of the above scheme more difficult than others. For instance, Furstenberg's 
ergodic theory argument requires some effort (and the axiom of choice) to set 
up the appropriate proxy for A, namely a measure-preserving probability sys- 
tem, and the structured recurrence theorem (which is in this case a recurrence 
theorem for a tower of compact extensions) is also somewhat technical. In the 
Fourier-analytic arguments of Roth and Gowers, the structured component is 
simply a nested sequence of long arithmetic progressions, which makes the rele- 
vant recurrence theorem a triviality; instead, almost all the difficulty resides in 
the structure theorem, or more precisely in enforcing the assertion that lack of 
uniformity implies a density increment on a smaller progression. Gowers' more 
recent hypergraph argument is more balanced, with no particular step being 
exceptionally more difficult than any other, although the fact that hypergraphs 
are involved does induce a certain level of notational and technical complexity 
throughout. Finally, Szemeredi's original argument contains significant portions 
(notably the use of the Szemeredi regularity lemma, and the use of density in- 
crements) which fit very nicely into the above scheme, but also contains some 
additional combinatorial arguments to connect the various steps of the proof 
together. 

In this paper we present a new proof of Szemeredi's theorem (Theorem 1.2) 
which implements the above scheme in a reasonably elementary and straight- 
forward manner. This new proof can best be described as a "finitary" or "quan- 
titative" version of the ergodic theory proofs of Furstenberg [9], [14], in which 
one stays entirely in the realm of finite sets (as opposed to passing to an infinite 
limit in the ergodic theory setting). As such, the axiom of choice is not used, 
and an explicit bound for Ngz(k, d) is in principle possible 1 (although the bound 

1 It may also be possible in principle to extract some bound for 7Vgz(fc,<5) directly from 
the original Furstenberg argument via proof theory, using such tools as Hcrbrand's theorem; 
see for instance [16] where a similar idea is applied to the Furstenberg- Weiss proof of van der 
Waerden's theorem to extract Ackcrmann-type bounds from what is apparently a nonquanti- 
tativc argument. However, to the author's knowledge this program has not been carried out 
previously in the literature for the ergodic theory proof of Szemeredi proof. Also we incor- 
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is extremely poor, perhaps even worse than Ackermann growth, and certainly 
worse than the bounds obtained by Gowers [19]). We also borrow some tricks 
and concepts from other proofs; in particular from the proof of the Szemeredi 
regularity lemma we borrow the L 2 incrementation trick in order to obtain a 
structure theorem with effective bounds, while from the arguments of Gowers 
[19] we borrow the Gowers uniformity norms U k ~ 1 to quantify the concept of 
randomness. One of our main innovations is to complement these norms with 
the (partially dual) uniform almost periodicity norms UAP k - 2 to quantify the 
concept of an uniformly almost periodic function of order k — 2. This concept 
will be defined rigorously later, but suffice to say for now that a model example 
of a uniformly almost periodic function of order k — 2 is a finite polynomial- 
trigonometric sum / : Ztv — > C of the form 2 

1 3 

Fix) := - C i e ( p j( x )/ N ) for a11 x £ Z N, (1) 

3 = 1 

where Zn :— Z/NZ is the cyclic group of order N, J > 1 is an integer, the Cj 
are complex numbers bounded in magnitude by 1, e(x) := e 2mx , and the Pj are 
polynomials of degree at most k — 2 and with coefficients in Z^. The uniform 
almost periodicity norms serve to quantify how closely a function behaves like 
(1), and enjoy a number of pleasant properties, most notably that they form a 
Banach algebra; indeed one can think of these norms as a higher order variant 
of the classical Wiener algebra of functions with absolutely convergent Fourier 
series. 

The argument is essentially self-contained, aside from some basic facts such 
as the Weierstrass approximation theorem; the main external ingredient needed 
is van der Waerden's theorem (to obtain the recurrence theorem for uniformly 
almost periodic functions), and we supply the standard short proof of van der 
Waerden's theorem in Appendix §11. As such, we do not require any famil- 
iarity with any of the other proofs of Szemeredi's theorem, although we will 
of course discuss the relationship between this proof and the other proofs ex- 
tensively in our remarks. In particular we do not use the Fourier transform, 
or theorems from inverse arithmetic combinatorics such as Freiman's theorem 
or the Balog-Szemeredi theorem, and we do not explicitly use the Szemeredi 
regularity lemma either for graphs or hypergraphs (although the proof of that 

porate some other arguments in order to simplify the proof and highlight some new concepts 
(such as a new Banach algebra of uniformly almost periodic functions). 

2 Actually, these functions are a somewhat special class of uniformly almost periodic func- 
tions of order k — 2, which one might dub the quasiperiodic functions of order k — 2. The 
relationship between the two seems very closely related to the distinction in ergodic theory 
between k — 2-stcp nilsystems and systems which contain polynomial cigenfunctions of order 
k — 2; see [15], [23] for further discussion of this issue. It is also closely related to the rather 
vaguely defined issue of distinguishing "almost polynomial" or "almost multilinear" functions 
from "genuinely polynomial" or "genuinely multilinear" functions, a theme which recurs in the 
work of Gowers [18], [19], and also in the theorems of Frciman and Balog-Szemeredi from in- 
verse additive combinatorics which were used in Gowers' work. It seems of interest to quantify 
and pursue these issues further. 
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lemma has some parallels with certain parts of our argument here). Also, while 
we do use the language of ergodic, measure, and probability theory, in partic- 
ular using the concept of conditional expectation with respect to a cr-algebra, 
we do so entirely in the context of finite sets such as Zjv; as such, a c-algebra 
is nothing more than a finite partition of Tin into "atoms" , and conditional 
expectation is merely the act of averaging a function on each atom 3 . As such, 
we do not need such results from measure theory as the construction of prod- 
uct measure (or conditional product measure, via Rohlin's lemma [29]), which 
plays an important part of the ergodic theory proof, notably in obtaining the 
structure and recurrence theorems. Also, we do not use the compactness of 
Hilbert-Schmidt or Volterra integral operators directly (which is another key 
ingredient in Furstenbcrg's structure theorem), although we will still need a 
quantitative finite-dimensional version of this fact (see Lemmas 9.3, 10.2 be- 
low). Because of this, our argument could technically be called "elementary". 
However we will need a certain amount of structural notation (of a somewhat 
combinatorial nature) in order to compensate for the lack of an existing body 
of notation such as is provided by the language of ergodic theory. 

In writing this paper we encountered a certain trade-off between keeping 
the paper brief, and keeping the paper well-motivated. We have opted pri- 
marily for the latter; if one chose to strip away all the motivation and redun- 
dant arguments from this paper one could in fact present a fairly brief proof 
of Theorem 1.2 (roughly 20 pages in length); see [39]. We also had a similar 
trade-off between keeping the arguments simple, and attempting to optimize the 
growth of constants for Nsz(k, 6) (which by the arguments here could be as bad 
as double- Ackcrmann or even triple- Ackermann growth); since it seems clear 
that the arguments here have no chance whatsoever to be competitive with the 
bounds obtained by Gowers' Fourier-analytic proof [19] we have opted strongly 
in favour of the former. 

Remark 1.3. Because our argument uses similar ingredients to the ergodic the- 
ory arguments, but in a quantitative unitary setting, it seems likely that one 
could modify these arguments relatively easily to obtain quantitative unitary 
versions of other ergodic theory recurrence results in the literature, such as those 
in [11], [12], [13], [3], [4]. In many of these cases, the ordinary van der Waerden 
theorem would have to be replaced by a more general result, but fortunately 
such generalizations are known to exist (see e.g. [42] for further discussion). In 
principle, the quantitative ergodic approach could in fact have a greater reach 
than the traditional ergodic approach to these problems; for instance, the recent 
establishment in [21] that the primes contained arbitrarily long arithmetic pro- 
gressions relied heavily on this quantitative ergodic point of view, and does not 
seem at this point to have a proof by traditional ergodic methods (or indeed by 
any of the other methods available for proving Szemcredi's theorem, although 
the recent hypergraph approach of Gowers [20] and of Rodl and Skokan [26], [27] 

3 Readers familiar with the Szemeredi regularity lemma may see parallels here with the 
proof of that lemma. Indeed one can phrase the proof of this lemma in terms of conditional 
expectation; sec [38]. 
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seems to have a decent chance of being "relativized" to pseudorandom sets such 
as the "almost primes"). Indeed, some of the work used to develop this paper 
became incorporated into [21], and conversely some of the progress developed 
in [21] was needed to conclude this paper. 

Remark 1.4. It is certainly possible to avoid using van der Waerden's theorem 
explicitly in our arguments, for instance by incorporating arguments similar to 
those used in the proof of this theorem into the main argument 4 . A decreased 
reliance on van der Waerden's theorem would almost certainly lead to better 
bounds for iVsz(fc, 8), for instance the Fourier-analytic arguments of Gowers [18] , 
[19] avoids this theorem completely and obtains bounds for N$z(k, 8) which are 
far better than that obtained by any other argument, including ours. However 
this would introduce additional arguments into our proof which more properly 
belong to the Ramsey-theoretic circle of ideas surrounding van der Waerden's 
theorem, and so we have elected to proceed by the simpler and "purer" route of 
using van der Waerden's theorem directly. Also, as remarked above, the argu- 
ment as presented here seems more able to extend to other recurrence problems. 

Remark 1.5. Our proof of Szemeredi's theorem here is similar in spirit to the 
proof of the transference principle developed in [21] by Ben Green and the author 
which allowed one to deduce a Szemeredi theorem relative to a pseudorandom 
measure from the usual formulation of Szemeredi's theorem; this transference 
principle also follows the same basic scheme used to prove Szemeredi's theorem 
(with Szemeredi's theorem itself taking on the role of the structured recurrence 
theorem). Indeed, the two arguments were developed concurrently (and both 
were inspired, not only by each other, but by all four of the existing proofs 
of Szemeredi's theorem in the literature, as well as arguments from the much 
better understood k = 3, 4 cases); it may also be able to combine the two to give 
a more direct proof of Szemeredi's theorem relative to a pseudorandom measure. 
There are two main differences however between our arguments here and those 
in [21]. Firstly, in the arguments here no pseudorandom measure is present. 
Secondly, the role of structure in [21] was played by the anti-uniform functions, 
or more precisely a tower of a-algebras constructed out of basic anti-uniform 
functions. Our approach uses the same concept, but goes further by analyzing 
the basic anti-uniform functions more carefully, and in fact concluding that such 
functions are uniformly almost periodic 5 of a certain order k — 2. 

4 This is to some extent done for instance in Furstcnbcrg's original proof [9], [14]. A key 
component of that proof was to show that the multiple recurrence property was preserved 
under compact extensions. Although it is not made explicit in those papers, the argument 
proceeds by "colouring" elements of the extension on each fiber, and using "colour focusing" 
arguments closely related to those used to prove van der Waerden's theorem (see e.g. [42]). 
The relevance of van der Waerden's theorem and its generalizations in the ergodic theory 
approach is made more explicit in later papers, see e.g. [15], [3], [4], and also the discussion 
in [42] 

5 In [21] the only facts required concerning these basic anti-uniform functions were that they 
were bounded, and that pseudorandom measures were uniformly distributed with respect to 
any sigma algebra generated by such functions. This was basically because the argument in 
[21] invoked Szemeredi's theorem as a "black box" to deal with this anti-uniform component, 
whereas clearly this is not an option for our current argument. 
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2 The finite cyclic group setting 

We now begin our new proof of Theorem 1.2. Following the abstract scheme 
outlined in the introduction, we should begin by specifying what objects we shall 
use as proxies for the set A. The answer shall be that we shall use non-negative 
bounded functions / : Zjv — * R + on a cyclic group Zjv := Z/NZ. In this 
section we set out some basic notation for such functions, and reduce Theorem 
1.2 to proving a certain quantitative recurrence property for these functions. 

Remark 2.1. The above choice of object of study fits well with the Fourier- 
based proofs of Szemeredi's theorem in [30], [31], [18], [19], at least for the 
initial stages of the argument. However in those arguments one eventually 
passes from Zjv to a smaller cyclic group Zn> for which one has located a 
density increment, iterating this process until randomness has been obtained 
(or the density becomes so high that finding arithmetic progressions becomes 
very easy). In contrast, we shall keep N fixed and use the group Zn throughout 
the argument; it will be a certain family of cr-algcbras which changes instead. 
This parallels the ergodic theory argument [9], [14], [10], but also certain variants 
of the Fourier argument such as [5], [6]. It also fits well with the philosophy of 
proof of the Szemeredi regularity lemma. 

We now set up some notation. We fix a large prime number N, and fix 
Z^v := Z/NZ to be the cyclic group of order N. We will assume that N is 
extremely large; basically, it will be larger than any quantity depending on any 
of the other parameters which appear in the proof. We will write 0(X) for 
a quantity bounded by CX where C is independent of N; if C depends on 
some other parameters (e.g. k and S), we shall subscript the O(X) notation 
accordingly to indicate the dependence. Generally speaking we will order these 
subscripts so that the extremely large or extremely small parameters are at the 
right. 
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Definition 2.2. If / : X — > C is a function 6 , and A is a finite non-empty subset 
of X, we define the expectation of f conditioning on A 7 

E(f\A) = E(f(x)\xeA) :=^j£/(aO 

where |A| of course denotes the cardinality of A. If in particular / is an in- 
dicator function / = 1q for some OCX, thus f(x) = 1 when x G and 
/(a;) = otherwise, we write P(0|j4) for E(1q| A). Similarly, if P(x) is an event 
depending on x, we write P(P\A) for E(lp|A), where lp( x ) = 1 when P(x) is 
true and lp( x ) := otherwise. 

We also adopt the following ergodic theory notation: if / : Zjy — * R is a 
function, we define the shifts T n f : Zjy — ► -R for any n e Zjv or n e Z by 

T"/(x) :=/(.x + n), 

and similarly define T n Vl for any C Z N by T n O := - n, thus T n l n = l T » n . 
Clearly these maps are algebra homomorphisms (thus T n (fg) = (T n f)(T n g) 
and T n (f + g) = T n f + T n g), preserve constant functions, and also preserve 
expectation (thus E(T™/|Zjv) = E(/|Zjv))- They also form a group, thus 
T n+ m = T n T m and r o ig the identity, and arc unitary with respect to the 
usual inner product (/, g) := E(fg). We shall also rely frequently 8 on the 
Banach algebra norm 

\\f\\ L ~ := sup |/(a;)| 

zeZjv 

and the Hilbert space structure 

(f,g):=-E(fg); \\f\\ L , := (/, f) 1 /* = E(|/| 2 )V2 ; 

later on we shall also introduce a number of other useful norms, in particular 
the Gowers uniformity norms U k ~ 1 and the uniform almost periodicity norms 
UAP k - 2 . 

To prove Theorem 1.2, it will suffice to prove the following quantitative 
recurrence version of that theorem. 

6 Strictly speaking, we could give the entire proof of Theorem 1.2 using only real-valued 
functions rather than complex-valued, as is done in the ergodic theory proofs, thus making 
the proof slightly more elementary and also allowing for some minor simplifications in the 
notation and arguments. However, allowing the functions to be complex valued allows us 
to draw more parallels with Fourier analysis, and in particular to discuss such interesting 
examples of functions as (1). 

7 We have deliberately made this notation to coincide with the usual notations of probability 
P(Q) and expectation E(/) for random variables to emphasize the probabilistic nature of many 
of our arguments, and indeed we will also combine this notation with the probabilistic one 
(and take advantage of the fact that both forms of expectation commute with each other). 
Note that one can think of E(f(x)\x 6 A) as the conditional expectation of f(x), where x is 
a random variable with the uniform distribution on X, conditioning on the event x £ A. 

8 Of course, since the space of functions on Zjv is finite-dimensional, all norms are equivalent 
up to factors depending on N. However in line with our philosophy that we only wish to 
consider quantities which are bounded uniformly in N, we think of these norms as being 
genuinely distinct. 
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Definition 2.3. A function / : Zjy — * C is said to be bounded if we have 

ll/lk- < i- 

Theorem 2.4 (Quantitative recurrence form of Szemeredi's theorem). 

For any integer k > 1, any large prime integer N > 1, any < 6 < 1, and an?/ 
non-negative bounded function f : Zjy — ► R + wii/i 

E(/|Z N )>«5 (2) 

we /mue 

fe-i 

E(JjT^/(aj)k,rGZ JV )>c(fc,<y) (3) 

/or some c(fc, (5) > 0. 

Remark 2.5. This is the form of Szemeredi's theorem required in [21] . This result 
was then generalized in [21] (introducing a small error 0/^5(1)) by replacing 
the hypothesis that / was bounded by the more general hypothesis that / 
was pointwise dominated by a pseudorandom measure. This generalization was 
crucial to obtain arbitrarily long progressions in the primes. We will not seek 
such generalizations here, although we do remark that the arguments in [21] 
closely parallel to the ones here. 

We now show how the above theorem implies Theorem 1.2. 

Proof of Theorem 1.2 assuming Theorem 2.4- Fix k,S. Let N > 1 be large, 
and suppose that A C {l,...,N} has cardinality \A\ > 6N. By Bertrand's 
postulate, we can find a large prime number N' between kN and 2kN. We 
embed {1, . . . , N} in Zjv' in the usual manner, and let A' be the image of A 
under this embedding. Then we have E(1^'|Zjv) > 8/2k, and hence by (3) 

fe-i 

E(~[[ T^ r l A ,(x)\x,r e Zjv/) > c(k,6/2k), 
1=0 

or equivalently 

\{(x,r) e Z N , : x,x + r,...,x+ (k - l)r G > c(fc, 5/2fc)(iV') 2 . 

Since iV' > fciV and A' C {l,...,iV}, we see that 1 < x < N and -AT < 
r < N in the above set. Also we may remove the r — component of this set 
since this contributes at most N to the above sum. If N is large enough, the 
right-hand side is still positive, and this implies that A contains a progression 
x, x + r, . . . , x + (k — l)r, as desired. □ 

Remark 2.6. One can easily reverse this implication and deduce Theorem 2.4 
from Theorem 1.2; the relevant argument was first worked out by Varnavides 
[41]. In the ergodic theory proofs, Szemeredi's theorem is also stated in a form 
similar to (3), but with Zjv replaced by an arbitrary measure- preserving system 
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(and r averaged over some interval {1, . . . , -/V} going to infinity), and the left- 
hand side was then shown to have positive limit inferior, rather than being 
bounded from below by some explicit c(k, 5). However these changes are minor, 
and again it is easy to pass from one statement to the other, at least with the 
aid of the axiom of choice (see [14] for some further discussion on this issue). 

It remains to deduce Theorem 2.4. This task shall occupy the remainder of 
the paper. 

3 Overview of proof 

We shall begin by presenting the high-level proof of Theorem 2.4, implementing 
the abstract scheme outlined in the introduction. 

One of the first tasks is to define measures of randomness and structure in 
the function /. We shall do this by means of two families of norms 9 : the Gowers 
uniformity norms 

\\f\\u°< ll/lltn <••.< \\f\\uK-i <...<||/|| L ~ 

introduced in [19] (and studied further in [23], [21]) and a new family of norms, 
the uniform almost periodicity norms 

UWUAPO > \\f\\ UAP i >...> WfWuAP"-' >■■■> H/Hloo 

which turn out to be somewhat dual to the Gowers uniformity norms. We shall 
mainly rely on the U k ~ 1 and UAP k ~ 2 norms; the other norms in the family are 
required only for mathematical induction purposes. We shall define the Gowers 
uniformity and uniform almost periodicity norms rigorously in Sections 4 and 5 
respectively. For now, we shall simply give a very informal (and only partially 
accurate) heuristic: a function bounded in UAP k ~ 2 will typically look some- 
thing like the polynomially quasiperiodic function (1) where all the polynomials 
have degree at most k — 2, whereas a function small in U k ~ 1 is something like 
a function which is "orthogonal" to all such quasiperiodic functions (1). 

Next, we state the three main sub-theorems which we shall use to deduce 
Theorem 2.4. The first sub-theorem, which is rather standard (and the easiest 
of the three to prove), asserts that Gowers-uniform functions (i.e. functions 
with small U k ~ 1 norm) are negligible for the purposes of computing (3); it will 
be proven in Section 4. 

Theorem 3.1 (Generalized Von Neumann theorem). [19] Let k > 2, 
and let \\, . . . ,\k be distinct elements of Zn- Then for any bounded functions 
fi, . . . , fk : Zjv — > C we have 

fc-i 

|E([] TV/i0«0k,r G Z N )\ < min H/J^-l 

3=0 

9 Strictly speaking, the U° and U 1 norms are not actually norms, and the UAP° norm can 
be infinite when / is non-constant. However, these issues will be irrelevant for our proof, and 
in the most interesting case k > 3 there are no such degeneracies. 
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Remark 3.2. As indicated, this part of the argument is based on the arguments 
of Gowers [19]; however it is purely combinatorial, relying on the Cauchy- 
Schwarz inequality rather than on Fourier analytic techniques (which occupy 
other parts of the argument in [19]). Variants of this theorem go back at least 
as far as Furstenberg [9]; see also [21], [23] for some variants of this theorem. 
We remark that the linear shifts Xjr can be replaced by more general objects 
such as polynomial shifts, after replacing the U k ~ 1 norm by a higher Gowers 
uniformity norm; this is implicit for instance in [3]. 

The second sub-theorem is a special case of the main theorem, and addresses 
the complementary situation to Theorem 3.1, where / is now uniformly almost 
periodic instead of Gowers-uniform; it will be proven in Section 10. 

Theorem 3.3 (Almost periodic functions are recurrent). Let d > and 

k > 1 be integers, and let fu± , fuAP be non-negative bounded functions such 
that we have the estimates 

5 2 

\\fu± ~ /uapWl* < 
E(A,x|Zjv) ><* 
W/uapWuap* < M 

for some < 5, M < oo. Then we have 
fe-i 

E(J] T«7^ (x)\x e Z N ; < r < > c (d, k, S, M) (7) 

3=0 

for some Co(k, S, d, M) > and all fi G Zjy and N\>1. 

Remark 3.4. This argument is a quantitative version of certain ergodic theory 
arguments by Furstenberg and later authors, and is the only place where the 
van der Waerden theorem (Theorem 1.1) is required. It is by far the hardest 
component of the argument. In principle, the argument gives explicit bounds 
for Co(d, k, 5, M) but they rely (repeatedly) on Theorem 1.1 and are thus quite 
weak. As mentioned earlier, we need this theorem only when d = k — 2, but 
allowing d to be arbitrary is convenient for the purposes of proving this theorem 

A 2 

by induction. It is important that the quantity 1Q24fc used in the right-hand side 
of (4) does not depend on M . This significantly complicates the task of proving 
this theorem when M is large, of course, since the error between fjj± and fuAP 
may seem to dominate whatever gain one can obtain from (6). Nevertheless, 
one can cope with such large errors by means of the machinery of u-algebras 
and conditional expectation. This ability to tolerate reasonably large L 2 errors 
in this recurrence result is also crucially exploited in the "Zorn's lemma" step 
in the ergodic theory arguments, in which one shows that the limit of a chain 
of extensions with the recurrence property is also recurrent. The parameters 
/j,, Ni are technical and are needed to facilitate the inductive argument used to 
prove this Theorem; ultimately we shall take yU := 1 and N\ := N — 1. 
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(4) 

(5) 
(6) 



Finally, we need a structure theorem, proven in Section 8, that splits an 
arbitrary function into a Gowers-uniform component and an uniformly almost 
periodic component (plus an error). 

Theorem 3.5 (Structure theorem). Let k > 3, and let f be a non-negative 
bounded function obeying (2) for some S > 0. Then we can find a positive num- 
ber M = Ok.$(l), a bounded function fu, and non-negative bounded functions 
fu±, fuAP such that we have the splitting 

I = fu + fa 

and the estimates (4), (5), (6) with d := k — 2, as well as the uniformity estimate 
\\fu\\u«-i < 2- k c (k-2,k,S, M) (8) 
where co(d,k,S, M) is the quantity in (7). 

Remark 3.6. The subscripts U and U 1 - stand for Gowers uniform and Cow- 
ers anti-uniform respectively. Thus this theorem asserts that while a general 
function / need not have any uniformity properties whatsoever, it can be de- 
composed into pieces which are either uniform in the sense of Gowers, or are 
instead uniformly almost periodicity, or are simply small in L 2 . This theorem is 
something of a hybrid between the Furstcnberg structure theorem [14] and the 
Szcmeredi regularity lemma [37]. A similar structure theorem was a key com- 
ponent to [21]. One remarkable fact here is that we could replace the quantity 
on the right-hand side of (8) by an arbitrary positive function of k, 8, M, at the 
cost of worsening the upper bound on M. The fact that the error tolerance in 
(4) does not go to zero as M — > oo is crucial in order to obtain this insensitivity 
to the choice of right-hand side of (8). 

Remark 3.7. Each of the above three theorems have strong parallels in the 
genuinely ergodic theory setting. For instance, the analogues of the U d norms 
in that setting were worked out by Host and Kra [23], where the analogue of 
Theorem 3.1 was also (essentially) proven. The structure theorem seems to 
correspond to the recent discovery by Ziegler [43] of a universal characteristic 
factor for Szemeredi-type recurrence properties, but with the role of the almost 
periodic functions of order k — 2 replaced by the notion of a k — 2-step nilsystem. 
The recurrence theorem is very similar in spirit to k — 2 iterations of the basic 
fact, established in [14], that recurrence properties are preserved under compact 
extensions (although our proof is not based on that argument, but instead on 
later colouring arguments such as the one in [3]). One can also extend the defi- 
nition of the Banach algebra UAP d defined below to the ergodic theory setting. 
It seems of interest to pursue these connections further, and in particular to 
rigorously pin down the relationship between almost periodicity of order k — 2 
and k — 2-step nilsystems. 

Assuming these three theorems, we can now quickly conclude Theorem 2.4. 

Proof of Theorem 2.4- Let /, k, S be as in Theorem 2.4. We may take k > 3 
since the cases k = 1,2 are trivial. Let M, fu, fu ± , fuAP be as in Theorem 
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3.5. We can then split the left-hand side of (3) as the sum of 2 fc terms of the 
form E(J|^q Ti r fj{x)\x,r G Zjv), where each of the functions fa, ... , fk-i is 
equal to either fjj or f v ± . The term in which all the fj are equal to f v ± is at 
least co(k, S, k - 2, M) by Theorem 3.3 (taking fj, := 1 and iVi := N — 1). The 
other 2 k — 1 terms have magnitude at most ||/(/||(7 fc -i < 2~ k c (k,5,k — 2,M) 
thanks to Theorem 3.1. Adding all this together we see that 

fc-i 

E("Q T ]r f(x)\x, r G Z N ) > 2- k c (k - 2, k, S, M). 
j=o 

Since M = Ofc,«(l), the claim (3) follows. □ 

It remains to define the J7 fe_1 and UAP k ~ 2 norms properly, and prove The- 
orems 3.1, 3.3, 3.5. This shall occupy the remainder of the paper. 



4 Uniformity norms, and the generalized von 
Neumann theorem 

In this section wc define the Gowers uniformity norms U d properly, and then 
prove Theorem 3.1. The motivation for these norms comes from the van der 
Corput lemma, which is very simple in the context of the cyclic group Z^: 

Lemma 4.1 (Van der Corput Lemma). For any function f G Zn — ► C, we 
have 

|E(/|Z N )| 2 - E(E(7T fc /|Zjv)|/i G Zjv). 
Proof. Expanding both sides the identity becomes 

E(J(x)f(y)\x, y G Zjv) = V(f(x)f(x + h)\x, h G Z/v) 
and the claim follows by the substitution y = x + h. □ 
Motivated by this lemma, we define 

Definition 4.2 (Gowers uniformity norms). [19] Let / : Z/v — > C be a 

function. We define the d th Gowers uniformity norm ||/||j/<i recursively by 

\\f\\uo :=E(f\Z N ) (9) 

and 

\\f\\ ud :=-E(\\jT h fg d -^\heZ N )y 2d (10) 

for all d > 1. 

Example 4.3. From Lemma 4.1, (9), (10) we obtain the explicit formula 

\\f\\ m = \E(f\Z N )\. (11) 
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In particular, the U 1 norm (and hence all higher norms) are always non-negative. 
The U 2 norm can also be interpreted as the l 4 norm of the Fourier coefficients 
of / via the identity 

\\f\\ u2 = ( Y, W(*)e(-xt/N)\x G Z N )\y/\ (12) 

though we will not need this fact here. The higher U d norms do not seem have 
any particularly useful Fourier-type representations, however by expanding (10) 
out recursively one can write the U d norm as a sum of / over c?-dimensional 
cubes (see [19], [21], [23] for further discussion of this). 

Remark 4.4. The U° and U 1 norms are not, strictly speaking, norms; the latter 
is merely a semi-norm, and the former is not a norm at all. However, the higher 
norms U d , d > 2 are indeed norms (they are homogeneous, non-degenerate, 
and obey the triangle inequality), and are also related to a certain 2 d -linear 
inner product; see [19], [21], or [23] for a proof of these facts (which we will 
not need here), with the d = 2 case following directly from inspection of (12). 
Also one can show the inequality < f° r any d > 0. Thus for 

k > 2, we have a rather interesting nested sequence of Banach spaces U k ~ Y of 
functions / : Zjv — * C, equipped with the U k ~ 1 norm; these Banach spaces 
and their duals (U k ~ 1 )* were explored to a limited extent in [21], and we shall 
continue their study later in this paper. Functions which are small in U 2 norm 
are termed linearly uniform or Gowers-uniform of order 1 7 and thus have small 
Fourier coefficients by (12); functions small in U 3 norm are quadratically uniform 
or Gowers-uniform of order 2, and so forth. The terminology here is partly 
explained by the next example; again, see [19], [21], or [23] for further discussion. 

Example 4.5. By induction 10 we see that ||/||c/d < for all d; in particular 

we have ||/||c/<j < 1 when / is bounded. We now present an example (which is, 
in fact, the only example up to scalar multiplication) in which equality holds. 
Let P:Zjv^Zjvbea polynomial with coefficients in Zjy, and let f{x) := 
e(P(x)/N). Then one can show that ||/||[/<* = 1 when d > deg(P), and ||/||c/< 1 = 
Odegp(l) when d < deg(P); the former fact can be proven by induction and the 
trivial observation that for each fixed h, the polynomial P(x + h) — P(x) has 
degree at most deg(P) — 1, while the latter fact also follows from induction, the 
above observation, and Lemma 4.1; we omit the details. In fact one can improve 
the Odegp(l) bound to Odogp(^~ 1//2 ), by using the famous Weil estimates. By 
using the triangle inequality for U d (see e.g. [19], [21]) one can also deduce 
similar statements for the polynomially quasiperiodic functions (1). 

One can easily verify by induction that the U d norms are invariant under 
shifts, thus ||T n /||f/d = H/Hj/d, and also invariant under dilations, thus if A G 
ZjvV) and f x (x) := f(x/X) then \\f x \\ ud - \\f\\ ud . 

We can now prove the generalized von Neumann theorem. 

10 Actually, more is true: the U d norms of / increase monotonically and converge to H/Hl 00 
as d — > co, although the convergence can be quite slow and depends on N. We will not prove 
this fact here. 
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Proof of Theorem 3.1. We induct on k. When k = 2 we use the fact that 
(x,r) (x + Air, 2; + A 2 r) is a bijection from Z 2 N to Z 2 N (recalling that N is 
prime) to conclude that 

1 

E(Y[T x i r fi{x)\x,r G Z N ) = E(/ 1 |Z A r)E(/ 2 |Z JV ) 

3=0 

and the claim then follows easily from (11) and the boundedness of f\,fi- Now 
suppose that k > 2 and the claim has already been proven for k — 1. By 
permuting the Xj if necessary we may assume that the minimum of the ||/j||t7 fc - 1 
is attained when j — 0. By making the scaling r 1— > Xor if necessary we may 
assume that Ao = 1. By applying the expectation-preserving map T~ Xk ~ ir (i.e. 
by subtracting \k-i from each of the Xj) we may assume that Afc_i is zero. The 
claim can now be written as 

k-2 

|E(/ fc _i(aOE( J] T^ r f 3 (x)\r G Z N )\x G Z N )\ < \\fo\\u^- 
3=0 

By the Cauchy-Schwarz inequality and the boundedness of fk-i, it suffices to 
prove that 

k— 2 

|E(|E(H T^fj(x)\r G Z^l 2 !* G Z w ) < H/of^. 
But from Lemma 4.1 we have 

k-2 ~k~2 ~ k-2 

E(|E(J] T^ r f j(x)\r G Z^v)! 2 ^ G Z^y) = E(([] ^/^Xll T^^ +,l )/,(x))|x, h,r G Zjy) 

j=0 j=0 j=0 

k-2 

= E(E([] T Xjr (JjT Xjh )(x)\x,r G Z^j/i G Z w ). 

On the other hand, from the induction hypothesis and the reduction to the case 
Ao = 1 we have 

k-2 

\E(l[T^(f-T^ h )(x)\x,r G Z N )\ < \\f T h f \\ uk - 2 

3=0 

for all h G Z N . Combining these two facts together, we obtain 

k-2 

|E(|E(JJ T^ r f 3 {x)\r G Zat)! 2 ^ g Z n ) < E(\\f T h f \\ uk - 2 \h G Z N ) 
3=0 

and the claim follows from (10) and Holder's inequality. □ 
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Remark 4.6. The notion of Gowers uniformity considered here, namely that the 
U k ~ 1 norm is small, generalizes the concept of pseudorandomness or linear uni- 
formity in the k = 3 case, which amounts to the assertion that all the Fourier 
coefficients of / (except possibly for the zero coefficient) are small; this is the 
notion used for instance in [30], [5], [22], [6]. This notion is essentially equiv- 
alent to the pair correlations of all the shifts T n f to be small on the average. 
For higher k, this notion is insufficient to obtain theorem such as Theorem 3.1, 
see [18], [19] for further discussion. In Szemeredi's original arguments [35], [36], 
the appropriate concept of uniformity is provided by the notion of e -regularity, 
which roughly corresponds to controlling all the U d norms for d < C(e), while 
in the ergodic theory arguments of Furstcnbcrg and later authors, the notion of 
uniformity used is that of weak mixing, which roughly corresponds to control- 
ling the U d norms for all d. Thus these notions of uniformity are significantly 
stronger than the one considered here, which fixes d at k — 1. There is of course 
a cost to using such a strong notion of uniformity, and it is that one has to 
make the tower of structures extremely large in order to eventually attain such 
uniformity. In Szemeredi's regularity lemma, for instance, one is forced to lose 
constants which are of tower-exponential type in the regularity parameter e; see 
[17]. In the ergodic theory arguments, the situation is even worse; the tower of 
invariant cr-algebras given by Furstenberg's structure theorem (the ergodic the- 
ory analogue of Szemeredi's regularity lemma) can be as tall as any countable 
ordinal, but no taller; see [2]. 

Remark 4.7. In the ergodic theory setting, one can also define analogues of the 
U k ~ 1 norms, giving rise to the concept of invariant cr-algebras whose comple- 
ment consists entirely of functions which are Gowers-uniform of order k — 2; 
using this notion (which is much weaker than weak mixing) it is possible to 
obtain a version of Furstenberg's structure theorem using only a tower of height 
k — 2 (in particular, a tower of finite height). Indeed, it was the author's discov- 
ery of this fact which led eventually to the quantitative proof presented here; 
we have since learnt that this fact is essentially implicit in the work of Host and 
Kra [23] and Zieglcr [43]. 

5 Almost periodic functions 

Having defined the Gowers uniformity norms U k ~ 1 used for the generalized Von 
Neumann theorem, we now turn to defining the dual concept of the uniform al- 
most periodicity norms U AP k ~ 2 which we will need for both the recurrence the- 
orem and structure theorems. Roughly speaking, if a function F has a bounded 
UAP k ~ 2 norm, then it should resemble a function of the form (1), which we 
shall loosely refer to as a quasiperiodic function of order k — 2. To quantify this 
we make the following observation: if F is of the form (1), then 

T n F = E(c n , j9j \jGJ), (13) 

where gj is the bounded function gj(x) := e{Pj{x)/N) 1 and c n j(x) is the 
Cnj(x) = e((Pj(x + n) — Pj(x))/N). The point here is that the dependence 
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on n on the right-hand side only arises through the functions c n j, and those 
functions arc of the form (1) but with degree k — 3 instead of k — 2. Thus, 
the shifts of an quasiperiodic functions of order k — 2 can be written as linear 
combinations of fixed bounded functions gj , where the coefficients c n j are not 
constant, but are instead quasiperiodic functions of one lower order 11 . We can 
pursue the same idea to define the UAP d norms recursively as follows. 

Definition 5.1 (Banach algebras). A space A of functions on Z^r, equipped 
with a norm \\\\ A : A — > R + , is said to be a Banach algebra if A is a vector 
space, 111)^4 is a norm (i.e. it is homogeneous, non-degenerate, and verifies the 
triangle invariant) which invariant under conjugation / /, and A is closed 
under pointwise product with H/glU < H/IUIIfi'lU f° r a h /><? S A. We also 
assume that 

< II^IU (14) 

for all F E A (actually this property can be deduced from the pointwise product 
property and the finite-dimensionality of A). We adopt the convention that 
\\f\\ A = oo if / g A. We say that A is shift-invariant if \\f\\ A = \\T n f\\ A for all 
n, and scale-invariant if ||/a|U = ll/IU f° r a h A G Ztv\{0}. 

Definition 5.2 (Uniform almost periodicity norms). If A is a shift-invariant 
Banach algebra of functions on Z^, we define the space J7APL4] to be the space 
of all functions F for which the orbit {T n F : n G Z} has a representation of the 
form 

T n F = ME{c n j l9h ) for all neZ N (15) 

where M > 0, H is a finite non-empty set, g = (gh)heH is a collection of 
bounded functions, c = ( c n,/i) ne Zjv fte-ff 1S a couec ti° n of functions in A with 
llcn^Hyi < 1, and h is a random variable taking values in H. We define the 
norm ||-F||(/APpt] to be the infimum of M over all possible representations of 
this form. 

Remark 5.3. Note that we are not imposing any size constraints on H, which 
could in fact get quite large (in fact one could allow H to be infinite, i.e. h 
could be a continuous random variable rather than a discrete one, without 
actually affecting this definition). It turns out however that we will not need 
any information about H , or more generally about the probability distribution 
of the random variable h. The key point is that the Volterra operator (ch)heH * 
E(chgh) will be a "compact" operator uniformly in choice of h and (gh)heH- 

We first observe that the construction A UAP[A] maps shift-invariant 
Banach algebras to shift-invariant Banach algebras: 

Proposition 5.4. If A is a shift-invariant Banach algebra, then so is UAP[A]. 
Furthermore U AP[A] contains A, and ||/||c7ap[A] < WJWa for all f e A. Finally, 
if A is scale-invariant then so is UAP[A\. 

11 This observation was motivated by the use of relatively almost periodic functions in the 
crgodic theory arguments of Furstenbcrg [9], [14], [10] and later authors. 
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Remark 5.5. This is a quantitative analogue of the well known fact in ergodic 
theory that the almost periodic functions form a shift-invariant algebra. 

Proof. It is easy to see that ?7AP[A] is shift-invariant, conjugation-invariant, 
closed under scalar multiplication, preserves scale-invariance, and that the PAP[A] 
norm is non-negative and homogeneous. From (14) and (15) we see that ||/||l°° < 
|/||;7 yip[A] for all / £ UAP[A], from which we deduce that the f/AP[A| norm is 
non-degenerate. Also we easily verify that UAP[A] cotnains A with ||/||[/ap[A] < 
|| /|| a f° r all / € A. Next, we show that PAP [A] is closed under addition and 
that the UAP d norm enjoys the triangle inequality. By homogeneity and non- 
degeneracy it suffices to show that the unit ball is convex, i.e. if F, F' £ CAP [.A] 
are such that \\F\\ UAP[A] , \\F'\\ UAP[A] < 1 then (1 - 6)F + OF' £ UAP[A] with 
||(1 - 6)F + 0F'\\ UAP[A] < 1 for all < 9 < 1. By Definition 5.2 we can find 
non-empty finite sets H,H', bounded functions (gh)h&H and {g' h i)h>eH', and 
functions {c n ,h) ne z N ,heH and ( c 'n,h') n eZ N ,h>eH' in A and random variables h, 
h! taking values in H and H' respectively such that we have the representations 

T n F = E(c n , h9h ); T n F' = E,{c' n>h , g' h ,\ti £ H') for all ntZ N (16) 

and the estimates 

||Cn,/i||A, llc^^'IU < 1 for all n £ Z N , he H,ti £ H' . 

Also, by relabeling H' if necessary we may assume that H and H' are disjoint. 
In such a case we can concatenate {c n ,h) ne z N ,heH and ( c 'n,h>) n eZ N ,h>eH> to 
a single function (c n h) ne z N heHuH' and similarly concatenate the g functions 

^° (dn k)heHuH' ■ ^ onc t ncn defines the random variable h to equal h with 
probability (1 — 0) and b! with probability (or more precisely, the probability 
distribution of h is 1 — 9 times that of h plus 9 times that of h! then one sees 
from linearity of expectation that 

T"(P + P')=E(^)for alln£Z N 

and the claim follows. 

Next, we establish the algebra property. By homogeneity and nondegeneracy 
again it suffices to show that the unit ball is closed under multiplication. To see 
this, start with (16). Without loss of generality we may assume that the random 
variables h, hi are independent (because it is only their individual distributions 
which matter for (16), not their joint distribution). But in that case we have 

T n {FF') = E(cn, h , h 'gh,h>) for all n £ Z N 

where c n ,h,h> ■— c n ,hC n ji' and gh,h' 9h,gh' ■ Since the product of two bounded 
functions is a bounded function, the claim follows from the algebra property of 
A. □ 

Thanks to this proposition, we can define the UAP d norms recursively for 
d > 0, by setting UAP° to be the trivial Banach algebra of all constant functions 
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(equipped with the i°° norm), and then setting UAP d := U AP[U AP 4 ' 1 } for 
all d > 1. Thus UAP d is a shift-invariant, scale-invariant Banach algebra for 
all d > 0. 

Example 5.6. If F is of the form (1), then one can verify by (13) and induction 
that F e UAP k ~ 2 with \\F\\ UAP k-2 < 1 (here h is an element of {1, . . . , J} 
chosen uniformly at random). In particular, UAP 1 contains the Wiener algebra 
of functions with absolutely convergent Fourier series. In our finitary setting 
of Zjv, this implies that every function lies in UAP 1 and hence in all higher 
U AP d norms, though the norm may grow with N and thus be very large. Note 
that the property of being in the Wiener algebra is substantially stronger than 
being almost periodic, which is roughly equivalent to asking that the Fourier 
coefficients are summablc in l 2 ~ £ for some e > rather than being summable in 
I 1 . For further comparison, the property of being bounded in (U 2 )* , as discussed 
in [21], is stronger than being almost periodic but weaker than being bounded 
in UAP 1 ; it is equivalent to asking for the Fourier coefficients to be summable 
in I 4 / 3 . See [21] for further discussion. 

Example 5.7. There are more subtle examples of almost periodic functions than 
the quasiperiodic ones. One example is the function 

f(x) := e([ax/N]b/N)ip(ax/N mod N)^(x/N mod 1) 

for some fixed 1 < a,b < N, where x is thought of as an integer from 1 to 
N, [x] denotes the integer part of x and tp(x) is a smooth cutoff to the region 
0.4 < x — [x] < 0.6. This function has an UAP 1 norm of O(l) uniformly 
in a, b, N, but the required representation of the form (15) is not particularly 
obvious (for instance one can set gh to be various translations and modulations 
of /, and then T n f can be decomposed as an absolutely summable combination 
of the gh using smooth partitions of unity and Fourier series). In this case, one 
can eventually work out that / also has an absolutely convergent Fourier series; 
however things are even less clear for the function 

f(x) := e([ax/N]bx/N)ip(ax/N mod N)ip(x/N mod 1), (17) 

which has an UAP 2 norm of O(l) but seems to have no particular resemblance 
with any quadratic phase function. These "generalized quadratic phase func- 
tions" are related to 2-step nilsystems, which are known to not always admit 
quadratic eigenfunctions; see e.g. [15] for further discussion. Intriguingly, hints 
of this "generalized quadratic" structure also emerge in the work of Gowers [18]. 
The situation here is still far from clear, though, and further study is needed. 

The structure theorem, Theorem 3.5, can be viewed as some sort of duality 
relationship between UAP k ~ 2 and U k ~ 1 . We now provide two demonstrations 
of this duality. The first such demonstration is rather simple, but is not actually 
used in the proof of Szemeredi's theorem; the second demonstration will be to 
some extent a converse of the first and is one of the key components used to 
prove Theorem 3.5. 
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Proposition 5.8 (Uniformity is orthogonal to almost periodicity). Let 

k > 2. For any functions f, F with F G UAP k ~ 2 , we have 

\(f,F)\ < H/ll^-x \\F\\ UApk -,. 

Proof. We induct on k. When k = 2 the claim follows from (11) and the 
fact that F is necessarily constant. Now suppose that k > 3 and the claim 
has already been proven for k — 1. By homogeneity it suffices to show that if 
Wf\\u»-^\\F\\uAP"-' <lthen|(.f,F)|<l. 

By Definition 5.2 we can find H, g, c, h with the representation (15), with 
gh bounded and Hc^/jH^pt-s < 1 for all n E Z N , h 6 H. Next, we use (15) the 
unitary nature of T n to write 

(f,F) - {T n f,T n F) = E(r/(x)E(c^(#))|x G Z N ); 

averaging over n and rearranging we thus have 

(/, F) = E(E(E(T n f(x)c^K(x)\n G Z N )gK(x)\x £ Z N )). 

By the Cauchy-Schwarz inequality and the boundedness of the gh, we thus have 

\(f,F)\ < E(E(\E(T n f(x)c^K(x)\n G Z N )\ 2 \x G Z^)) 1 / 2 . 

But from Lemma 4.1 we have 

\E(T n f(x)c^j(x)\n G Z^ 2 = E(T"(/T r /)(a;) C „, ft (x)c^7r^)|«, r G Z N ) 

whence 

< E(E((/r f '/,T-"(£wrc n+I ., ft ))|n,r G Z^)) 1 / 2 . 

Since UAP k ~ 3 is a shift-invariant Banach algebra we have \\T~ n (c^hC n+ri h)\\uAP k - 3 < 
1. By the inductive hypothesis we thus have 

\(fT r f,T- n (c^c n+r ,h))\ < \\fT r f\\ uk - 2 , 

whence 

|(/,F)|<E(E(||7T r /|| t , fc - a |n,rGZ JV )). 

The outer expectation can be discarded since the quantity inside the expectation 
is deterministic. We may similarly discard the redundant n average. Using 
Cauchy-Schwarz and (10), we thus obtain 

\(f,F)\< Il/H^-! <1 

as desired. □ 

Remark 5.9. One can use this Proposition to give an alternate proof of Theorem 
3.1, based on the observation (easily verified by induction) that if /i, . . . , fk-i 
are bounded functions and Ai, . . . , Xk-i are disjoint non-zero elements of Zjv, 
then E([\ k -l T x ' r f 3 \r G Z N ) lies in UAP k - 2 with norm at most 1. 
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Remark 5.10. In the notation of [21], this shows that the UAP k ~ 2 norm is larger 
than or equal to the (U k ~ 1 )* norm. However, the UAP k ~ 2 norm appears to be 
strictly stronger. For instance, as observed in [21] the (U 2 )* norm is the Z 4 / 3 
norm of the Fourier coefficients, and this norm does not form a Banach algebra 
(in fact, it does not even control L°° unless one loses a power of iV 1 / 4 ), and so 
cannot be equivalent to the UAP 1 norm. 

We now give a partial converse to Proposition 5.8, which is the key to The- 
orem 3.5. 

Lemma 5.11 (Lack of Gowers-uniformity implies correlation with a 
UAP function). [21] Let f be a bounded function such that > £ for 

some k > 3 and e > 0. Then there exists a bounded function F € UAP k ~ 2 with 
||-Fl|j7AP*-= < 1 such that \{f,F)\ > e 2 ^ 1 . 

Proof. We need the concept of a dual function from [21]; the ergodic theory 
analogue of such functions have also been recently studied in [23], [1]. For any 
function / : Zjy — > C and any d > 0, we define the dual function of order d of 
/, denoted V d (f), by the recursive formula 

T>o(f) ■■= 1 (18) 
(i.e. T> (f) is just the constant function 1) and 

V d {f) := E(V d ^(JT\f)T h f\h e Z N ) (19) 

for all d > 1. 

We now claim the identity 

(f,V d (f)) = ||/||£ 

for all d > 0. When d = the claim follows from (18) and (9). Now suppose 
inductively that d > 1 and the claim has already been proven for d— 1. By (19) 
(and the definition of the inner product) we have 

(f,V d (f)) = E((V d ^(fT h f)JT h f)\h e Z N ), 

and the claim now follows from the inductive hypothesis and (10). 

We thus set F := Dfc_i(/). It is clear from induction that F is bounded; it 
remains to show that F has UAP k ~ 2 norm less than 1. Indeed, we make the 
more general claim that 

H^cK/OHt/Aprf- 1 < 1 f° r & H bounded / and all d > 1. 

When d = 1 this is clear since T> d (f) is just the constant function E(/) in this 
case. Now suppose inductively that d > 2 and the claim has already been proven 
for d—1. Applying T" to both sides of (19) and making the change of variables 
h <— n + h we obtain 

T n V d {f) := E(c n , h g h \h e Zjv), where c„, ft := T n V d ^(fT h - n f) and := T h f. 
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The functions gn are clearly bounded. Since UAP d ~ 2 is a shift-invariant Banach 
algebra, we see from inductive hypothesis that the functions c ni h lie in UAP d ~ 2 
with norm at most 1, and the claim follows (thinking of h as an element of Zn 
chosen uniformly at random). □ 



6 cr-algebras of almost periodic functions 

To prove both the recurrence theorem (Theorem 3.3) or the structure theorem 
(Theorem 3.5) it is convenient not just to work with almost periodic functions, 
but also with certain a-algebras generated by them (in analogy with the er- 
godic theory arguments). Thus in this section we develop the theory of such 
a-algebras. 

We begin by recalling the definition of a a-algebra (not necessarily associated 
with an almost periodic function). This very useful concept is of course equiv- 
alent in the unitary setting of Zn to the more familiar notion of a partition of 
Zn, but we will retain the language of probability theory in order to maintain 
the analogy with the ergodic theory arguments, and in order to benefit from 
such useful concepts as conditional expectation, orthogonality, measurability, 
energy, and so forth. See [38] for a further discussion of the connection between 
the Szemeredi regularity lemma, partitions, and conditional expectation with 
respect to cr-algebras. 

Definition 6.1 (cr-algebras). A a-algebra B in Zn is any collection of subsets 
of Zn which contains the empty set and the full set Zn, and is closed under 
complementation, unions and intersections. We define the atoms of a a-algebra 
to be the minimal non-empty elements of B (with respect to set inclusion); it 
is clear that the atoms in B form a partition of Zn, and B consists precisely 
of arbitrary unions of its atoms (including the empty union 0); thus there is a 
one-to-one correspondence between a-algebras and partitions of Zn- A function 
/ : Zn — > C is said to be measurable with respect to a a-algebra B if all the 
level sets of / lie in B, or cquivalently if / is constant on each of the atoms of 
B. 

We define L 2 (B) C L 2 (Zn) to be the closed subspace of the Hilbert space 
L 2 (Zn) consisting of B- measurable functions. We can then define the condi- 
tional expectation operator / E(/|S) to be the orthogonal projection of 
L 2 (Zn) to L 2 (B). An equivalent definition of conditional expectation is 

E(/|B)(aO :=V(f(y)\y£B(x)) 

for all x E Zn, where B(x) is the unique atom in B which contains x. It is 
clear that conditional expectation is a linear self-adjoint orthogonal projection 
on L 2 (Zn), preserves non-negativity, expectation, and constant functions. In 
particular it maps bounded functions to bounded functions. If E(/|B) is zero 
we say that / is orthogonal to B. 

If B, B' are two a-algebras, we use BV B' to denote the a-algebra generated 
by B and B' (i.e. the a-algebra whose atoms are the intersections of atoms in 
B with atoms in B'). 
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Observe that when B is the trivial cr-algebra {0,Ztv} then the conditional 
expectation E(/|£>) is just the constant function equal to the ordinary expec- 
tation E(/|Zjv). Every cr-algebra induces a unique orthogonal decomposition 
/ = E(/|B) + (/ - E(/|B)) of a function / into the component E(/|£>) mea- 
surable with respect to £>, and the component / — E(,f \B) orthogonal to B. 
More generally, if B is a subalgebra of B' (thus B' is finer than £>, or B is 
coarser than B') then we can orthogonally decompose the finer expectation 
E(/|B') = E(/|B) + (E(/|S') - E(/|B)) into the coarser expectation, and a 
component measurable in B' but orthogonal to B. 

We now show that each almost periodic function generates a well-behaved 
(7- algebra at every scale e. 

Proposition 6.2 (UAP functions generate a compact cr-algebra). Let 

d > 0, let G G UAP d be such that \\G\\ UAP d < M for some M > 0, and let 
e > 0. Then there exists a cr-algebra B £ (G) = B £ {G,d) consisting of at most 
Om. e (1) atoms, such that we have the following two properties: 

• (G lies in its own cr-algebra) We have the approximation property 

\\G-E(G\B £ (G))\\ L oo =0(e). (20) 
Similarly if B £ (G) is replaced by any finer a -algebra. 

• (Approximation by almost periodic functions) For any bounded non-negative 
function f which is measurable in B £ (G), and any S > 0, there exists a 
bounded non-negative function fijAP £ U AP d such that 

\\f - fuAp\\ L * < S (21) 

and 

WfuAp\\uAP d = Om, e ,j(1)- (22) 

Remark 6.3. As the proof shows, the above Proposition in fact holds if UAP d 
is replaced by any other Banach algebra. 

Proof. We shall prove this by constructing B £ (G) using randomized level sets 
(or "generalized Bohr sets" of G), using some ideas from [21]. Let S := {z e C : 
— 1/2 < |Sft(.z)|, < 1/2} be the unit square in the complex plane, and let 

Z[i] := {a + bi : a, b G Z} denote the Gaussian integers. Let a G S be a complex 
number chosen uniformly at random from 5*. We can then define the cr-algebra 
B £ta {G) to be the algebra whose atoms are the sets {G~ 1 (e(S+(+a)) : ( G Z[«]}. 
It will suffice to show that with positive probability this algebra B £yCt (G) is a 
candidate for B £ (G) (with the bounds in (20), (28) uniform in a). 

The bound (20) is clear since G is constrained to lie in a square of diameter 
O(e) on each atom of B £ (and hence on each atom of any finer cr-algebra). Since 
||G||x,oo < HGHc/ap 1 * < G takes values in a ball of radius O(M), and thus 
the number of atoms in B £ is indeed Om,e(1) as claimed. Now we turn to the 
approximation property. It will suffice to prove that for each S > and every 
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rj > 0, the approximation property is true (with the bound in (28) allowed to 
depend on 77) with probability at least 1 — rj, since one can then set 8 := 2~ n 
and 77 := 5/2 for n = 1,2,... (for instance) and conclude the claim is true for 
all 6 with positive probability. 

Now fix S, rj. Since every bounded non-negative functions / is a convex 
combination of indicator functions of the form In where fl G £> e , Q (G), and 
the number of such functions is Om, e (1) (since B StCt (G) only Om, e (1) atoms), it 
suffices (after shrinking r\ appropriately) to prove the claim for a single indicator 
function / := 1q. 

Fix f2; we can then write 

/ = l n = l lv (e- 1 G - a) 

where W C C is some union of Ox(l) translates of the unit cube S. 

Let < (T < 1 be a small number (it will eventually be much smaller than 5, 
r], or e) to be chosen later. Let dW a be the er-neighbourhood of the boundary 
dW of W. By Urysohn's lemma combined with the Weierstrass approximation 
theorem (and the fact that G = 0{M)) we can write 

/ = P(e~ 1 G - a,e^G-u) + 0(a) + 0(l)l aWj (e~ l G - a) (23) 

for some polynomial P = Pw,M,e,a 01 two complex variables. Denote the first 
term on the right-hand side of (23) by fuAP, then from UAP d hypothesis on 
G and the Banach algebra nature of UAP d we have 

WSuapWuap* - O m ,p(1) = M , e , a (l), (24) 

which will give (22) once a is selected properly at the end of the argument. 
Now consider the third term on the right-hand side of (23). Observe that 

UowAeG - a)||| 2 = P^G^) - a € dW a \x e Z N ) 

< P(e" 1 G(a;) - a e dS a + ( for some C € Z[i]\x £ Z N ) 

where dS a is the cr-neighbourhood of the boundary dS of the unit square. Ob- 
serve that as a varies over S, the event 

e^Gix) -aedS a + ( for some C £ Z[i] 

has probability 0(a) regardless of what e~ 1 G(x) is. We thus have 

E(UdwAeG-a)\\h)<0(a). 

By Markov's inequality, we thus see that the expression inside the expectation 
is O v (a) with probability at least 1 — 77. Inserting this into (23) we obtain 

W.f-fuAp\\L^=0(a)+O v (a 1 / 2 ) 

and the claim (21) follows by setting a sufficiently small depending on S (and 
then (22) will hold by (24)).' □ 
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Henceforth we shall fix an assignment of a cr-algebra B e (G) — B e (G, d) with 
the above properties for each almost periodic function G G UAP d and each 
e > 0. Note that while we did use a randomization argument here, it is possible 
to make such an assignment constructive, for instance by well-ordering all the a- 
algebras of Zjv in some constructive way and then choosing the minimal algebra 
which obeys the above properties (with the exact choice of bounds in (28), etc. 
held fixed) . Thus we do not require the axiom of choice at this step (or indeed 
at any step in this argument). For similar reasons wc may ensure that this 
procedure is shift-invariant, in the sense that 

T n B e (G) = B e (T n G) for all n G Z N , (25) 

where T n B := {T n fl : Q G B} is the cr-algebra B shifted backwards by n. This 
shift invariance amounts to making sure the same a is chosen for all the shifts 
T n G of a fixed function G, which is easy enough to ensure since the constraints 
needed for a are independent of the choice of n. 

The above Proposition pertained to a cr-algebra generated by a single almost 
periodic function, but we can easily extend it to algebras generated by multiple 
functions as follows. 

Definition 6.4 (Compact cr-algebras). Let d > and X > 0. A cr-algebra 
B is said to be compact of order d and complexity at most X if it has the form 

B = B £1 (G 1 ) y...WB e JG K ) (26) 

for some < K < X, some s\, . . . ,Sk > XTT> ano - some G\, . . . , Gk G UAP d ^ 1 
with norm ||Gj -H/y^P 1 * < X f° r au 1 < .7 < K. Wc define the d-complexity (or 
simply complexity) of a cr-algebra to be the minimal X for which one has the 
above representation, or oo if no such represenation exists. In particular, the 
trivial cr-algcbra B = {0, Zjy} is compact of order d with complexity 0. 

Remark 6.5. The terminology is motivated here by ergodic theory, see e.g. [14]; 
a compact cr-algebra of order d here corresponds in the ergodic setting, roughly 
speaking, to a tower of height d of compact extensions of the trivial algebra. 
The complexity X is a rather artificial quantity which we use as a proxy for 
keeping all the quantities used to define B under control. 

The key property we need concerning these cr-algebras is that the measurable 
functions of cr-algebras which are compact of order d are (with high probability) 
well approximated by almost periodic functions of order d: 

Proposition 6.6 (UAP functions are dense in compact cr-algebras). Let 

d > 0, X > 0, and let B be a cr-algebra which is compact of order d and complex- 
ity at most X. Let f be a bounded non-negative function which is measurable 
with respect to B, and let S > 0. Then we can find a bounded non-negative 
fuAP G UAP d such that 

\\f - fuAp\\ L i < 5 (27) 

and 

WfuAp\\uAP*=O dAX {l). (28) 
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Proof. We first verify the claim when / is the indicator function 1a of an atom 
A of B. From Definition 6.4 we can expand B in the form (26), and hence we 
can write A = A\ n . . . fl Ak where each Aj is an atom in B Ej (Gj). From 
Proposition 6.2 and the bounds on £j, Gj, K coming from Definition 6.4, we 
can find bounded non-negative functions fuAPj G UAP d for all 1 < j < K 
such that 

HU, -fuAPjW < S / K 

and 

WIuapjWl 2 = O^s/K^j.xi^) = O d .s,x(l)- 

Since \a } and fuAPj are both bounded and non-negative we have the elemen- 
tary pointwise inequality 

K K K 

i n n fuAP,j\ < i ia j ~ fuAPj\ 

J'=l J'=l J=l 

and hence if we set /^ap := IljLi fuAP.j then (27) follows from the triangle 
inequality, and (28) follows the Banach algebra nature of UAP d . Since fuAP is 
clearly bounded and non-negative, the claim follows. 

Now suppose / is an arbitrary bounded non-negative function measurable 
with respect to B. Then we can write / = J2a c a^-a where A ranges over 
the atoms of B and < ca < 1 are constants. Let a = a(d, S, X) > be a 
small number to be chosen later, then by the preceding discussion we can find 
bounded non-negative Juap,a S U AP d for all A such that 

II 1a - fuAP,A\\h 2 < c 

and 

WfuAP,A\\uAP d = Od,X,a{^)- 

If we then set fuAP '■= J2a c a/uap,a and observe from Proposition 6.2 that B 
contains at most Od,x(X) atoms, we thus have 

\\f-fuAp\\Li<0 d ,x(0-) 

and 

WfuAp\\uAP d = Od,X,a(l)- 

We are however not done yet, because while fuAP is non-negative, it is not 
bounded by 1; instead we have a bound of the form < fuAp(%) < Od.x(^)- 
To fix this we need a real- valued polynomial P(x) = Pd,s,x{%) such that 

\P(x) — max(x, 1)| < (5/2 and < P(x) < 1 for all x in the range of fuAP', 

such a polyomial exists by the Weierstrass approximation theorem. If we then 
set fuAP '■= P(fuAp), then fuAP is bounded and non-negative, and we have 

WfuAP - mzx(f UA p, 1)||l 2 < S/2 
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and (since UAP d is a Banach algebra) 



WfuAp\\uAP d — 0d,a,X,cr(l)- 

On the other hand, since / is bounded above by 1, we have 

||/ - max(/[/AP, l)||x,2 < ||/ - IuapWlz < O d ,x((r), 

and the claims then follow from the triangle inequality if a is chosen sufficiently 
small depending on d, S, X. □ 



7 The energy incrementation argument 

The proof of the recurrence theorem (Theorem 3.3) and the structure theorem 
(Theorem 3.5) relies not only on cr-algebras of almost periodic functions, which 
we constructed in the previous section, but also on the notion of the energy 
of a cr-algebra with respect to a collection of functions, and of the recursive 
energy incrementation argument which we will need to prove both the recurrence 
theorem and the structure theorem. This energy incrementation argument, 
which was inspired by the proof of the Szemercdi regularity lemma (see e.g. [37]), 
is perhaps one of the most important aspects of this argument, but unfortunately 
is also the one which causes the Ackcrmann-type (or worse) blowup of bounds. 
It is the counterpart of the more well-known density incrementation argument 
which appears in several proofs of Szemeredi's theorem (starting with Roth's 
original argument [30], but see also [18], [19], [22], [6], [35], [31], [36]). In that 
strategy one passes from the original set {1, . . . , N} to a decreasing sequence 
of similarly structured subsets (e.g. arithmetic progressions or Bohr sets) while 
forcing the density <5 of the set A to increase as one progresses along the sequence; 
eventually one finds enough "randomness" to obtain an arithmetic progression. 
The hope is to show this algorithm terminates successfully by using the trivial 
fact that the density is always bounded above by 1. To do this, it is important 
that the density increment depend only on S, and not on other parameters 
such as N or the complexity of the structured subset. This rather stringent 
requirement on the density increment is one cause of technical complexity and 
length in several of the arguments mentioned above. 

In our situation, the role of "structured subset" will be played by a cr-algebra 
generated by almost periodic functions, and the role of density played by the 
energy of that cr-algebra. This energy will automatically increase as the cr- 
algebra gets finer, and is also automatically bounded. Once again, however, the 
energy increment may be very small, depending for instance on the complexity 
of the cr-algebra, and this algorithm may once again fail to terminate. This 
problem also appears in the ergodic theory setting, in the context of an infinite 
tower of cr-algebra extensions; to resolve this one must show that the supremum 
of any tower of extensions with the recurrence property also has the recurrence 
property. This appears difficult since the cr-algebras in this tower may become 
arbitrarily complex, and the lower bound obtained by the recurrence property 
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may go to zero as one approaches the supremum of the tower. Nevertheless, 
one can conclude the argument, basically by observing that any measurable 
function in the supremum of the tower can be approximated in L 2 norm (say) 
by a measurable function in some finite component of this tower, and a simple 
argument then allows one to deduce recurrence for the former function from 
recurrence from the latter function regardless of how small the recurrence bound 
is for the latter; see e.g. [14] for an example of this. This may serve to explain 
why we have the error tolerance (4) in the recurrence theorem. 

We begin by defining the energy of a cr-algebra (relative to some fixed col- 
lection of functions); this can be thought of as somewhat analogous to the more 
standard notion of the entropy of an algebra in both information theory and 
ergodic theory, but the energy will be adapted to a specific fixed collection of 
functions /i, . . . , f m , whereas the entropy is in some sense concerned with all 
possible functions at once. 

Definition 7.1 (Energy). Given a m-tuple / = (/i, . . . , f m ) of functions fj : 
Zjv — > C of functions and a cr-algebra B, we define the energy £f{B) to be the 
quantity 

m 

£ f (B) :=Y,\\nfj\B)\\h- (29) 
In practice m will either be 1 or 3. Observe that we have the trivial bounds 

m 

0<£ f (B)<Y,\\mh- (30) 

i=i 

Also from Pythagoras's theorem and the orthogonality considerations discussed 
above we see that if B' is finer than B, then 

m 

y £\\E{f j \&)--E{f j \B)\\l a =£ f {B')-S f (B). (31) 
i=i 

In particular, the energy of B' is larger than or equal to B. 

We now describe, in abstract terms, the idea of the energy increment strat- 
egy. Suppose one is trying to prove a statement P(M) involving some large 
parameter M > which one hopes to keep under control; for instance, one may 
be trying to bound some fixed expression E from above by M or from below by 
1/M. To begin with, this statement does not depend on any cr-algebras. But 
now we introduce a cr-algebra £>, which we initialize to be the trivial algebra 
B = {0, Ztv}, and try to prove P(M) using an argument which is in some sense 
"relative to i3" (in particular, the bounds M may depend on some measure of 
how "complex" B is). Either this argument works, or it encounters some ob- 
struction. The idea is then to show that the obstruction forces the existence of 
a new cr-algebra B' which is finer than B (and typically more complex than B) 
and has slightly more energy. One then replaces B by B' and then repeats the 
above strategy, hoping to use the trivial bound (30) to show that the argument 
must eventually work relative to some cr-algebra. 
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The difficulty with this strategy is that the energy increment obtained by this 
method typically depends on the complexity of the cr-algcbra B, which tends 
to grow rather quickly. As such it is possible for this method to get bogged 
down at some intermediate energy in the range (30) and not terminate in any 
controlled amount of time. To get around this, it turns out that instead of just 
using a pair B C B' of cr-algebras, it is better to use a triplet ScB'c B" of 
(j-algebras, with the energy gap between B and B' allowed to be moderately 
large (bounded by a quantity that does not depend on the complexity of any 
of these algebras). The idea is then to try to prove P{M) relative to the pair 
(B,B'), but using bounds which depend only on the complexity of B and not 
on £>'. If the argument encounters an obstruction, then one can replace B' by 
a more complex B" , with an energy increment again depending only on the 
complexity of £>; thus this energy increment will not go to zero as B' becomes 
more complex. There is now a second obstruction when the energy gap B' and 
B becomes too large, but then one replaces B by £>'; this can only occur a finite 
number of times because we do not allow the bounds for this energy gap to 
depend on the complexity and thus the energy increment here is bounded from 
below by a fixed constant. 

To make this argument more precise we encapsulate it in the following ab- 
stract lemma (which has a certain resemblance to Zorn's lemma, and can be 
in fact thought of as a "quantitative" version of that lemma; it also resembles 
the proof of the Szcmeredi regularity lemma). We are indebted to Ben Green 
for suggesting the use this type of energy incrementation argument, which is 
for instance used in our joint paper [21] to establish arbitrarily long arithmetic 
progressions in the primes. 

Lemma 7.2 (Abstract energy incrementation argument). Suppose there 
is a property P(M) which can depend on some parameter M > 0. Let d > 0, 
and let f = (/i, . . . , f m ) be a collection of m bounded functions. 

Suppose also that we have an r > for which the following dichotomy holds: 
for any X, X' > 0, and given any a-algebra B which is compact of order d with 
complexity at most X, and any a-algebra B' which is finer than B and also 
compact of order d with complexity at most X', then if the energy gap condition 



holds, then either P(M) is true for some M = Od, r ,x,X'(l), or we can find a 
a-algebra B" finer than B' which is compact of order d with complexity at most 
Od,T,x,x'{^) suc h that we have the energy increment property 



for some positive quantity c(d,T,X) > which does not depend on X' . 
Then P(M) is true for some M = O m ,d, r (l). 

Remark 7.3. The point of this lemma is that it reduces the task of proving 
some property P{M) to the easier task of proving a dichotomy; either P can 



£ f (B')-£ f (B)<r\ 



(32) 



E f (&')-£/(&) > c{d,r,X) > 



(33) 
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be proven, or we can increment the energy of a certain cr-algebra while keeping 
the complexity under control. It is crucial that r does not depend on the com- 
plexities X, X\ and that the energy increment c(d,r,X) depends only on the 
lower complexity X and not the higher complexity X' , otherwise this lemma 
fails. Note that no quantitative knowledge on the growth of this complexity on 
of the energy increment bound c(d,r,X) is necessary although of course the 
explicit form of the final bound O m .d, T (l) on M will depend quite heavily on 
those growth rates. This argument proceeds by a double iteration and thus typ- 
ically produces bounds which are of Ackermann type or worse, but in principle 
they are computable. 

Proof. The proof proceeds by running the following double iteration algorithm, 
constructing a pair of a- algebras B and £>', both compact of order d and with 
£>' finer than B, as follows. 

Step 1 Initialize B to the trivial algebra B := {0, Zjy}. 

Step 2 Initialize B' to equal B (thus trivially verifying the energy gap condition 
(32)). Let X denote the complexity of B. 

Step 3 Let X' denote the complexity of B' . If P(M) is true for some M — 
Od,T,x,x>(l) then wc halt the algorithm. Otherwise, we must by hypoth- 
esis be able to locate a cr-algebra B" which is compact of order d with 
complexity at most Od,T,x,x'(l) with the energy increment property (33), 
and we continue on to Step 4. 

Step 4 If £ f (B") - £ f {B) < t 2 , then we replace B' by B" (thus preserving (32)) 
and return to Step 3. Otherwise, we replace B by B" and return to Step 
2. 

Observe that for each fixed B of complexity X, the algorithm can only iterate 
for at most Od, T ,x{l) times before changing B. This is because every time B' 
is changed, the energy £/(£>') increases by at least c(rf, r, X), but if the energy 
ever exceeds £/(£>) + r 2 then we must change B. Note that it is crucial here that 
the energy increment c(d,r,X) not depend on the complexity X' of B', which 
may be growing quite rapidly during this iteration process. In particular, if B 
finally does change, its complexity will increase from X to at most Od,T,jf(l)- 
Next, observe that B can only be changed at most O m ,i-(l) times, because each 
time we change B, the energy £/(£>) increases by at least r 2 , but the energy is 
always non-negative and is bounded by m. Combining these two observations 
we see that the entire algorithm must halt in O mi d. T (l) steps and all cr-algebras 
constructed by the algorithm have complexity at most O m ,ci, -r(l)- The claim 
follows. □ 

8 Proof of the structure theorem 

We now prove the structure theorem, Theorem 3.5. Naively, the idea would be 
to take B to be the cr-algebra formed by all the U AP d ~ l functions, and then take 
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fu = / — E(/|B) and f v ± = E(/|£>) to conclude the result (the uniformity of fu 
arising from Proposition 5.11, and Proposition 6.6 being used to locate fuAp)', 
this would be the exact analogue of how one would proceed in the genuinely 
ergodic setting when the underlying space is infinite and one does not care 
about quantitative control on the complexity of the cr-algebra. Unfortunately 
this approach does not work because there are far too many 

UApd-i f unc tions 

available, and the complexity of B would explode with ./V (indeed, it is likely 
that B would simply be the total cr-algebra consisting of arbitrary subsets of 
Zjv)- Thus, in the quantitative setting, one must be substantially more "choosy" 
about which V AP d ~ x functions to admit into the algebra B - they should only 
be the ones which have a good reason for being there, such as having a non- 
trivial correlation with the function /. It turns out that the best framework 
for doing this is given by the abstract energy incrementation argument given 
in the previous section, exploiting the fact that each function that one adds 
to the a algebra increases the energy of that algebra, especially if there is a 
correlation with /. As such, the proof this theorem does not actually require 
one to know what the function Co(d, k, S, M) in Theorem 3.3 actually is (although 
this function will of course influence the final bound on M), and so we can prove 
this theorem before proving the (somewhat more difficult) recurrence theorem, 
Theorem 3.3, in the next section. 

In view of the energy incrementation argument, it suffices to prove the fol- 
lowing dichotomy: 

Lemma 8.1 (Structure theorem dichotomy). Let k > 3, and let f be a 

non-negative bounded function obeying (2) for some S > 0. Let B C B' be 
a-algebras which are compact of order k — 2 with complexity at most X, X' 
respectively, and and obey the energy gap condition (32) with t := 50 Q 0fc . Then 
at least one of the following must be true: 

• (Success) We can find a positive number M = Ok.s,x(^) a, bounded func- 
tion fu, and non-negative bounded functions f v ±, fuAP such that we have 
the splitting f = fu + fu 1 - an d the estimates (4), (5), (6) with d := k — 2, 
as well as the Gowers uniformity estimate (8). 

• (Energy increment) We can find a a-algebra B" finer than B' which is 
compact of order d and complexity Ok,s,x,x'(l) such that 



for some c(k,5,X) > independent of X' . 

Indeed, Theorem 3.5 follows immediately by applying Lemma 8.1 to Lemma 
7.2 (using m = 1 and using the bounded function /, and r := 5 ^ 00k ) ■ 

Proof of Lemma 8.1. Fix B, B'. Since E(/|B) is non-negative and bounded, and 
B is compact of order k — 2 with complexity O(X), We may apply Proposition 
6.6 to find a non- negative bounded function fuAP such that 



£ f {B") -£ f {B') > c{k,S,X) > 



(34) 



W\B) ~ fuAp\\v < 



5000fc 



(35) 
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and 

WfuAp\\uAP k - 2 < M 

for some M = Ofe,5,x(l), which we now fix. From (35) and Cauchy-Schwarz we 
observe that 

|E(/|Zjv) - E(f UAP \Z N )\ = |E(E(/|S) - fuAp\Z N )\ < 



5000fc 

and in particular (by (2)) 

E(f UAP \Z N ) > 8/2. 

Now split f = fu + f v ±, where fo± := E(/|B') and := / - E(/|B')- We 
have already proven the estimates (5), (6), while (4) follows from (32) (recall 
r = 50 Q 0fc ) , (31), (35), and the triangle inequality. If the estimate (8) held then 
we would now be done (in the "Success" half of the dichotomy), so suppose 
instead that 

\\fu\\u>-i > 2- fc c (fc-2,M,M). 

By Lemma 5.11, we can thus find a function G e UAP d ~ 2 with ||G ! ||t/AP d - 2 < 1 
such that 

\(fu,G)\>c(k,6,M)>0 (36) 

for some positive quantity c(k,6, M). Now write B" := B' V B e (G), where 
e = e(k,5,M) > is to be chosen later. We thus split f v = (f - E(f\B")) + 
(E(/|B") - E(/|B')) and G = (G- E(G|B")) - (E(G|B") - E(G|S')) + E(G|B')- 
The first terms in both expansions are orthogonal to B" (and thus to B'), while 
the second terms are measurable in B" and orthogonal to B' , while the third 
term of G is measurable in B' . Thus 

(/ C /,G) = (/-E(/|B"),G-E(G|B"))+(E(/|S")-E(/|S'),E(G|B")-E(G|B'))- 
From (20) and the boundedness of / we have 

|(/-E(/|B"),G-E(G|B"))|<0(e). 

Thus if we choose e sufficiently small depending on k, 6, M, we see from (36) 
that 

|(E(/|B') -E(/|S),E(G|S') - E(G|B))| > \c{k,5,M). 
Since G is bounded, we thus see from Cauchy-Schwarz that 

||E(/|B , )-E(/|B)|| ia > \c{k,5,M) >0. 

But then (34) follows from (31). Finally, the complexity bound on B" follows 
from Definition 6.4, the complexity bound on B, and the choice of e and M. 
We are thus in the "Energy increment" half of the dichotomy, and the lemma 
follows. □ 
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The proof of the structure theorem is now complete. 

Remark 8.2. It may be possible to prove this structure theorem more directly, 
without explicitly invoking cr-algebras, for instance by setting up a extremiza- 
tion problem such as that of minimizing the U k ~ 1 norm of fu subject to the 
constraints (4), (5), (6), the splitting / = /[/ + fu ± t and the bounded non- 
negativity of fuAP and fjj± . We were unable however to achieve this in a 
clean way, especially when it came to maintaining the boundedness and non- 
negativity conditions, whereas the conditional expectation method achieves this 
more painlessly. 



9 Compactness on atoms, and an application of 
van der Waerden's theorem 

To prove Szemeredi's theorem, the only thing that now remains is to prove 
the recurrence theorem for almost periodic functions, Theorem 3.3. In this 
section we present a key Proposition, which illustrates the applicability of van 
der Waerden's theorem (Theorem 1.1) to the problem of obtaining recurrence 
for a function / whose shifts T n f enjoy a representation such as (15). The key 
idea is that the functions on the right-hand side of (15) live in a sufficiently 
"compact" space of functions that they can be "finitely coloured" , at which 
point van der Waerden's theorem can be used to establish recurrence 12 . As we 
show at the end of this section, we can quickly use this Proposition to deduce 
the d — 1 case (as well as the rather trivial d = case) of Theorem 3.3 as a 
corollary. 

Proposition 9.1 (Recurrence for conditionally UAP functions). Let B 

be a n-algebra, let M > 0, let H be a finite non-empty set, and let for each 
n € Tin and h 6 H let c n _y l be a bounded B -measurable function and let gu be 
a bounded function. Let h be a random variable taking values in H, and define 
the functions F n for all n <G Z at by the formula 

F n := ME{c n , h9h ) (37) 

(compare with (15)). Let fu±. be a bounded non-negative function, and for any 
5 > 0, n E Zn, and k, e Z + let E\(k, 5,k*,B) G B be the set 

E n (k,6,k*,B) := {x e Z N : E(T n f u ± \B)(x) > - and 

2 

V{\T n fu--Fx m \\B){x)< A}. 

12 This argument was inspired, not by the original ergodic theory arguments of Furstenberg, 
but of the later colouring-based arguments, for instance in [3] . It may be possible to adapt the 
older arguments in, say, [14] instead here, which have the advantage of using the same length 
k for the progression throughout the argument, instead of replacing ft by a considerably larger 
fc* as is done here. This might ultimately lead to somewhat better final bounds, although it 
still seems that one would still get Ackermann-type dependence or worse on the constants. 
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Then for every S > and k £ Z + there exists fc* = K(k, 5, M) suc/i iftaf 
fe-i 

E([| Tri r f v ±{x)\x £ Z N , 1 < r < 7V ) 

> c(fc,<y,M)E(P( fl £7 M Am(fe, S, K,B)\Z N )\l < A < JV„/fc.) 

rn— 1 

(39) 

for all fi £ Zjv and N > K, and some c(k, S) > 0. 

Remark 9.2. The point is that this theorem reduces the task of establishing lower 
bounds for recurrence expressions involving fu±, to that of establishing lower 
bounds for the recurrence behaviour of £>-measurable sets E tl \ m (k,S,k^,B). 
This is advantageous if B is "simpler" than the original function fjj± ; in prac- 
tice, fjj± will be approximately an almost periodic function of order d, and B 
will be a compact a algebra of order d — 1, and thus functions measurable in 
B can be approximated by almost periodic functions of one lower order than d. 
This is the key to the proof of Theorem 3.3 we give in the next section, which 
proceeds by induction on d. On the other hand, the bounds on K given by 
our proof involve van der Waerden numbers, which will cause Ackermann type 
growth rates or worse in our final bound. 

Proof of Proposition 9.1. To prove (39) it suffices to prove the "localized" ver- 
sion 

fc-l k, 

E( -Q T pA(a+, s)/[M(x) | x £ Z N ;l < a, s < K/k) > c(S,k, K)P( f| E„ Aro (fc, S, K, B)\Z N ) 

j — 711—1 

(40) 

for each 1 < A < No/K. Indeed, if (40) held then upon averaging in A we 
obtain 

fe-i 

E(E(]J T ^ a+ 3 s \f u± \Z N )\1 < A < N /K; l<a,s< K/k) 

3=0 

k, 

> c(<5,fc,fc*)E(P( f) E„ Xm (k,S,K,B)\Z N )\l < A < N /K). 

m—l 

The T^ Xa can be factored out of the product and makes no difference to the 
expectation, thus it can be discarded. The a averaging then becomes redundant, 
and we obtain 

fe-i 

E(E(J]T^'V c/i |Z jV )|l < A < iV /fc*;l < s < K/k) 

3=0 

k, 

> c(5,k,K)E(P{ f| E^ m {k,S,K,B)\Z N )\l < A < N /K). 

m—l 
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The claim (39) then follows by observing that every 1 < r < No has at most 
Ofc„,fc(l) representations of the form r = \(a + js) with 1 < A < 7V /fc* and 
1 < a,s < k*/k. (The dependence of /c» is ultimately irrelevant since fc» itself 
will ultimately depend on <5, fc, M). 

It remains to prove (40). Fix /i, A. By absorbing fi into A we may take /i = 1 
(we will not use the upper or lower bound on A). Since f| m * =1 E\ m (k,S,k*,B) 
is measurable in B, it is the union of atoms A e B. It will suffice to prove the 
pointwise estimate 

fc-i 

E(E(]jT A ( a +^V [/i |A)|l < a,s < K/k) > c(6,k,k*) 
j=o 

for each such atom, as the claim then follows by multiplying this formula by 
P(A\Zn) and summing over all atoms in E\(k, 5, fc*, B). 

Now fix the atom A. Since the number of pairs (a, s) is Ok,k, (1), it suffices 
to locate a single pair (a, s) with 1 < a, s < k*/k such that 

fe-i 

E(l[T x ^ s \f u± \A)>c(S,k) (41) 

3=0 

for some c(S, k) > 0. 

We now pass from the shifts T x ( a+: > s \f u ± to the functions Fx( a +js)- We 
claim that to prove (41) it would suffice to prove that 

||*A(«+j.) - F Xa \\ L 2 (A) < — for all < j < k - 1, (42) 

where L 2 (A) is the Hilbert space given by the norm ||F||l 2 (A) : = E(|F| 2 |^4) 1/2 . 
To see this claim, observe from Cauchy-Schwarz that (42) implies 

V(\F x{a+js) - Fa,, 1 1 A) < — for all < j < k - 1. 
But by (38) and the choice of A, we also have 

E(|fA(a+j.) - T^+^/^HA) < A for all < j < k - 1. 
From the triangle inequality we thus have 

E(|T A < a+ ^/ - T Xa f v ± \\A) < — for all < j < k - 1. 
This in particular implies 

15(5 

E(l |TMo+ , 5) v _ T A. /[;1 |>iT^/^7|A) < for all < j < k - 1. 
On the other hand, by (38) again we have 

E(T AQ /^I^)> 5/2- 
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Thus 

^' i -\TMa+j3)f u± _ T Xaf u± \<i T \af for all 0<j<fe-l^ "/^ ± ) - 3^ - 

By Holder's inequality and the non-negativity of f v ± this implies that 

E ( 1 \TMa+is) fu± _ T Xa fu _ Ll <* T Xa fu± fo r all 0< j < k- 1 ( T "A/^ ) ) ^ (ggfc) " 

The claim (41) then follows from the elementary pointwise inequality 

^k 

Y[T X{a+:)S \f U ± > - 1 \ T M«+3s) fu± _ T Xa fu±l <4 T Xa fu _ L for all 0< j<k-l C 7 ^"/^ ) * • 

It remains to find a pair (a, s) obeying (42). Using (37) it suffices to find an 
(a, s) such that 

l|E(cA(o+j»),fcfffc) - E(c Aa ,^)ll^(A) < */8Mfc for all < j < fc - 1. (43) 

Note that as the c n ,h are measurable with respect to B, they are constant on A, 
and so without loss of generality we can treat them just as bounded complex 
numbers (this is the whole point of working on individual atoms in the first 
place). The g^ are not constant, but we can think of them as bounded functions 
on A. 

To proceed further we need the following compactness property of averages 
of the form E(c h g h ) in L 2 (A). 

Lemma 9.3 (Total boundedness property). There exists integers 1 < 
mi, ... , m,L < k* for some L < C(k, M, S) such that 

inf \\E(c\ m ,hgh) - TZ(c\ muh gh)\\L 2 {A) < f or alll <m < k*. 

Remark 9.4. The key point here is that the bound on L does not depend on the 
size of H, A, or N. This is a quantitative analogue of the basic result (used in 
the ergodic theory proofs, see e.g. [14]) that a Volterra integral operator from 
one finite measure space to another is necessarily a compact operator in L 2 , and 
thus the range of any bounded set can be covered by a finite number of 5-balls 
in L 2 . 

Proof. Let us write f m :— E(c m .hgh)\A- We construct an orthonormal system of 
functions vi,V2, ■ ■ ■ ,vj in L 2 (A) by performing the following algorithm, which 
can be viewed as a rudimentary version of the energy increment algorithm dis- 
cussed in previous sections (with the role of cr-algebras replaced by the simpler 
notion of finite-dimensional subspaces of a Hilbert space). 

Step Initialize J = 0. 
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Step 1 Let V C L 2 {A) be the subspace spanned by the v\, . . . ,vj (so initially this 
will be the trivial space {0}). 

Step 2 If there exists a 1 < m < k* such that disti2(^j(/ m , V) > <5/64Mfc, then 
by Hilbert space geometry we can find a unit vector v.j + i orthogonal to 
V (and thus to all the v\, . . . ,vj) such that (/ m , v,/+i}l 2 (A) I > <5/64Mfc. 
In such a case, we choose 13 such a vj+i, increment J, and return to Step 
1. Otherwise, we terminate the algorithm. 

We claim that this algorithm terminates in Ok,M,s(l) steps. Indeed, for each 
Vj generated by this algorithm, we see from construction that there exists an 
m = m(j) GZn such that 

\E{c\m,h{9h,Vj) L 2( A ))\ = \{fm,Vj) L 2( A )\ > 5/MMk. 

Here we have crucially taken advantage of the fact that c\ m ^h is constant on A. 
Since c\ m ^ is bounded, we thus see from the Cauchy-Schwarz inequality that 

E(I(^,^)^(A)| 2 )>(^) 2 . 
Summing this in j, we obtain 

EC£\( 9h , Vj ) LHA) f)>(-J^fj. 

3 = 1 

But from the boundedness of the gh, the orthonormality of the Vj, and Bessel's 
inequality, the left-hand side is at most 1. Thus J < ( 64 ^ fc ) 2 = Ok,M,s{l) as 
claimed. 

Now observe from the construction of the algorithm that all the functions f m 
will lie within 5/QAMk (in the L 2 (A) metric) of the J-dimensional space V. In 
particular, we see from the triangle inequality, the crude bound ||/m||z, 2 (A) < 1 
arising from our bounds on c n .h and gh, and finite-dimensional geometry that 
there can be at most Ofe,5,j(l) = Ofe,M,5(l) functions f mi , . . . , f mL which are all 
separated from each other by at least 8/16Mk in the L 2 (A) metric. The claim 
now follows by the usual greedy algorithm. □ 

Using this lemma, we can introduce a colouring function c : {1, . . . , A;*} — ► 
{l,...,L}by 

c(m) := inf{l < I < L : \\E{c X m,h9h) - V{c Xmi , h g h )\\ L 2 (A) < S/16Mk}. 

By van der Waerden's theorem, if fc* = k*{k,L) = k*(k,5,M) is chosen suf- 
ficiently large, then we can find 1 < a,s < k*/k such that the progression 
a, a + s, . . . , a + (k — l)s is monochromatic. The claim (43) now follows from 
the triangle inequality. This concludes the proof of Proposition 9.1. □ 

13 Note that since m ranges over a finite set, the axiom of choice is not needed here, since 
Zjv is clearly well-ordered. Because we are always in a finite (or at least finite dimensional) 
setting, similar considerations apply to other parts of the argument in which an arbitrary 
choice has to be made. 
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As a quick corollary of this Proposition we can now prove the d — 1 case, at 
least, of Theorem 3.3. 



Proof of Theorem 3.3 when d = 1. Let f v ± , fuAP, k, M, <5, e be as in the The- 
orem. From (6) and Definition 5.2 we can find a finite non-empty set H, a col- 
lection of bounded constants {c n ,h) ne z N -heH' an< ^ bounded functions (gh)hen, 
and a random variable h taking values in H such that we have the representation 
(37), where F n :— T n fuAp- We thus apply Proposition 9.1 with N :— Ni and 
B set equal to the trivial tr-algebra B = {0, Z^}, since the c ny h are all almost 
periodic of order and hence constant. But by (38), we see that E\ m (k, 5, k*,B) 
is either the empty set or all of Z N , with the latter occuring if 



But the latter condition is automatic from (5), while the latter follows from (4), 
Cauchy-Schwarz, and the choice of e; note that the shift T Xm has no effect on 
the expectation E(|Zjv). Thus E\ m (k,S,k*,B) = Zjy for all A, and the claim 



Remark 9.5. As we shall see, the d > 1 case is somewhat more complicated, 
the problem beign that one has to somehow "quotient out" the effect of a very 
large number of almost periodic functions of order d — 1 before the property of 
being almost periodic of order d emerges as a usable property. This appears 
to unfortunately be rather necessary, even when d = 2, at least with the argu- 
ments currently available; the author would consider this issue of the least well 
understood components of the theory. Consider for instance a function f ap of 
the form fAp(x) = ip(x 2 /N), where ip : R/Z — > [0,1] is a smooth bounded 
non- negative function which is periodic with period 1, which equals 1 on the 
interval [—5,(5], and vanishes outside [—25,2(5]. This function can be shown to 
be almost periodic of order 2 with an UAP 2 norm of 0,5(1). Thus Theorem 3.3 
should allow us to locate a large number of arithmetic progressions of length k 
in the support of ip, for reasonably large values of k (e.g. k = 5). To actually es- 
tablish even this special case, however, seems rather difficult, the simplest proof 
probably being the ergodic theory proof that lifts this problem up to establish- 
ing recurrence for the skew shift on the two-dimensional torus. Similarly for 
more complicated examples such as (17) (now the ergodic system is a two-step 
nilsystem, formed by quoticnting the unipotcnt upper triangular 3x3 matrices 
by the subgroup of matrices with integer coefficients). In [19] this precise prob- 
lem was encountered, and solved by using very directly the number-theoretic 
structure of x 2 /N (and similarly polynomial objects), in particular a quantita- 
tive version of Weyl's theorem on the uniform distribution of polynomials. The 
problem of having to deal with generalized polynomials instead of polynomials 
was avoided by working on relatively short arithmetic progressions, in which 
one could approximate the former by the latter. 



E(T Xm fu±\Z N ) > 8/2 and E{\T 



f - F Xm \\Z N ) < ± 



(7) follows from (39). 



□ 
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10 Recurrence for almost periodic functions 



We now conclude the proof of Theorem 3.3, and thus of Theorem 1.2. We have 
already handled the d = 1 case. The case d — can either be deduced from 
the d — 1 case, or can be worked out directly by an easy argument which we 
leave to the reader (the point being that fuAP is now constant and fu± and 
its shifts will have to be larger than, say, 6/2 with very high probability, say at 
least 1 — l/2k). Thus it remains to handle the d > 1 cases. We may assume as 
an inductive hypothesis that d is fixed and the claim has already been proven 
for d — 1 . 

When [i = the claim follows easily from (5) and the boundedness of f v ± , 
so we may take fx ^ 0. But then we may rescale by \i and set /i = 1. 

We would like to apply Proposition 9.1 as we did in the d = 1 case. The dif- 
ficulty now is that the functions c n _h generated by Definition 5.2 are no longer 
constant, but are themselves almost periodic of one lower order, d — 1. The 
strategy is then to locate a cr-algebra B generated by such functions (and hence 
compact of order d — 1) with respect to which the c n .h are close to being mea- 
surable (i.e. close to constant on most atoms). Proposition 9.1 then allows us 
to reduce the problem of establishing recurrence for f u ± to one of establishing 
a property very similar to recurrence for certain subsets of B, which we can 
handle by combining the induction hypothesis with Proposition 6.6. As with 
the structure theorem, one would naively want to take B to be the cr-algebra 
generated by all the c Uy h (and this is indeed what one does in the genuinely 
ergodic setting), but again we lose control of the complexity this way. Instead 
we must be much more selective with which c„^ we admit. Again, the easiest 
framework to implement this idea is given by the abstract energy increment 
lemma, Lemma 7.2. The point is that it may happen that the c n ^ are refusing 
to be close to measurable on £>, or that other problems arise such as B failing 
to be sufficiently "shift-invariant" (this issue arose in the d = 1 case when one 
needed to eliminate the T Xm shift, although in that case the resolution to the 
problem was trivial). In that case, however, the simplest solution is to replace B 
by a larger cr-algebra B', to which one adds in all the obstructions (or at least a 
representative sample thereof) which one encountered in closing the argument, 
thus increasing the energy of B. 

We turn to the details. It will suffice to establish 

Proposition 10.1 (Recurrence theorem dichotomy). Let d > 2, and sup- 
pose that Theorem 3.3 has already been proven for d—1. Let k > 1 be integers, 
and let M,5 > 0. All quantities in what follows can depend on d, 8, k, M 
(including the implicit bounds in 0() notation), and we omit future dependence 
on these parameters, let fu± , fuAP be non-negative bounded functions obeying 
the bounds {A), (5), (6). Write f := (f u ±,\f u ± - f UAP \, \f v ± - fuAP?)- Let 
B C B' be a-algebras which are compact of order k — 2 with complexity at most 
X , X' respectively, and such that (32) holds for some small r > independent 
of X, X' to be chosen later. Then at least one of the following must be true: 
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• (Success) We have 

fc-i 

E(I1 T jr fu±(x)\x G Z w ,0 < r < N x ) > c(t,X) (44) 
3=0 

/or some c(r, X) > cmc? aZZ Ni > 0. 

• (Energy increment) We can find a a-algebra B" finer than B' which is 
compact of order d and complexity O r ,x,X'(l) such that 

£ f (B")-£ f (B')>c(T,X)>0. (45) 

Note that the increment c{t,X) in (45) is independent of X' . 

Indeed, Theorem 3.3 will follow from this Lemma and Lemma 7.2 (setting 
m = 3 and letting l/c(r, X) play the role of M). 

It remains to prove Proposition 10.1. We will aim towards applying Proposi- 
tion 9.1, by locating a large subset of B' where f v ± (and several of its shifts) are 
large on average, fuAP is close to f v ± on average (as are various shifts of these 
functions), and the c n ,h are close to constant, and then using the induction hy- 
pothesis to obtain lower bounds on the sets obtained this way. There may 
be some obstructions to implementing this strategy, but when they arise we will 
convert those obstructions to an energy increment, establishing (45) instead of 
(44). 

Proof of Lemma 10.1. By (6) and Definition 5.2 we can find a finite non-empty 

set H, a collection of bounded functions (c n ,/i) ng z jv .; ieff in U AP 6 -^ 1 with \\c n ,h \\i/AP d - 1 < 

1, and bounded functions (gh)h£H, and a random variable h taking values in H 

such that we have the representation 

T n f UA p = ME(c„,^) (46) 

for all n £ Zjy. We cannot yet apply Proposition 9.1 since the c„ ; /j are not 
necessarily measurable with respect to B'; indeed there are too many of the 
c n .h to safely add all of them to £>', which needs to have bounded complexity. 
Instead, we shall work using much smaller batches of c Uy h and then average at 
the end. 

We will need a large integer iV = N (t,X) > 1 to be chosen later 14 If 
N\ < N then the claim (44) follows easily from (5) just by considering the 

14 Thcre arc a number of parameters involved here, which are at several different scales. In 
order to have some idea of what parameters should be large and what parameters should be 
small, we suggest using the hierarchy 

d,k ,\,M,k* <i<X<JV <X'< Ni, N, \H\ 
5 T 

which is a very typical arrangement of the parameters. The key points are that the energy gap 
t does not depend on the large parameters X, No, X' , Ni , N, \ H\, that the energy increment in 
(34) does not depend on the very large parameters X',Ni,N, \H\, and the remaining bounds 
do not depend on the extremely large parameters Ni,N, \H\. 
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r = component of the left-hand side, so we will assume N± > N a . We then 
observe that 

fe-i fe-i 
E(J] T^ r f u± (x)\x G Z N ;0 < r < N x ) > c(JV )E(E(JJ T^ r f u± (x)\x G Z N , 1 < r < JV )|1 < fi < JVi/JVo) 

(47) 

because each < r < Nq has at most On (1) representations of the form r = /ir' 
where 1 < r' < N and 1 < fi < Nx/Nq. 

Now fix a single 1 < /i < Ni/Nq, and consider the expression 

fe-i 

E( TJ TWf u± ( x )\x eZ N ,l<r< N ). (48) 

1=0 

Observe that the exponents /ijr now range in the relatively small set /i-{0, . . . , (fc— 
l)iVo}. This has localized the "n" index in (46) to a reasonably bounded set 
(one which is independent of N), but the "/i" parameter is still ranging over a 
potentially unbounded set H. To resolve this we need the following variant of 
Lemma 9.3. 

Lemma 10.2 (Finite-rank approximation). Let /j, G Z^. Then we can find 
hi, ... , h N iaa G H (not necessarily distinct, and depending on \i) such that 

Wnc^.hgh) - E(c Mm , fc . 5h . |1 < j < D)\\ L 2 < O(7V - 40 ) (49) 
for allO<m< (k-l)N . 

Proof. We use the second moment method. Set D := A^ 00 , Gh ■= c^m^gh, 
F := E(Gh), and let hi,..., ho be D independent samples of the random 
variable h. We will show that 

(kNn) 1/2 1 

P(\\F E(G hj \l < j < D)\\„ > [ - J ^ b -) < —, 

which implies the claim with probability at least 1 — ^ k ~]^° +1 > 0. 
By Chebyshev's inequality it will suffice to show that 

E(\\F-E(G hj \l<j<D)f L2 )<l/D. 

The left-hand side can be expanded as 



E(E(\F(x)\ 2 -2?KE(F(x)G hj (x)\l < j < D) + \E(G h ,(x)\l < j < D)\ 2 \x e Z N )) 
which we expand and rearrange further as 



\F\\ 2 L2 -2$tE(F(x)E(G hj (x))\l <j<D;x€ Z N )+E(E(G hjl (x)G hj (x))\l < j,f <D;xe Z N ). 

(50) 
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Since hj, hj> were chosen with the same distribution as h, and are independent 
when j ^ j', we have 

E(G hj (x)) = E(G h (x))=F(x) 

and 

E(G^JxjG hj (x))) = E(G^x)\ti e H)E(G h (x)\h e H) = \F(x)\ 2 when j ^ f . 
We thus can rewrite (50) as 

||F||| 2 -2||F||| 2 + ||F||| 2 +EfeKE(G^G, l3 (x))-|F( a ;)| 2 )|l < j,f <D;xe Z N ) 

where 5jj> is the Kronecker delta. When j = f, we have 

E{G^Jx)G h] {x)) - \F(x)\ 2 = E(\G h (x)\ 2 \h e H) - \E{G h {x)\h e H)\ 2 

which is at most 1 since Gh is bounded (in fact one can sharpen this to j, but 
we will not need this). The claim follows. □ 

Let hi, ... , h N wo be as in the above Lemma. Then from (49) and (46) we 
see that 

\\T^ m .f UAP -ME{c^ m . h] g hj \l < j < N™°)\\ L 2 < 0(N^ ) for all < m < (k-l)N . 

(51) 

We have now modeled a reasonably large number of shifts of our almost periodic 
function fuAP m terms of a controlled number of functions c limi h j . Next, we 
define a new a-algebra B" finer than B' (and depending on [i, hi, ... , h N i<m) by 

B":=( V T^B')y{ \/ ^N-M^m, hj )) 

-(k-l)N <m<(k-l)N 0<m<(k-l)N o ;l<j<N^ 

where B e (G) are the cr-algebras constructed by Proposition 6.2. Since the c^m.h- 
are in UAP d ~ 1 with norm at most 1, and B' was compact of order d — 1 and 
complexity at most X', we see from (25) and Definition 6.4 that B" is also 
compact of order d — 1 and complexity at most On ,x'(^)- Also, from (20) we 
have 

Wc^hj - E(c^ ro ,^|B")||Lo= < O(N- W0 ) for all < m < (k - 1)N 
and hence (since the are bounded) 

\\ME{c^ hj9hj \l < j < D)-ME(E(c^ m , hj \B")g hj \l < j < D)\\ L 2 < 0(N^ im ). 
Combining this with (51) we see that 

WT^fuAP - F^ m \\ L 2 < M (Nv 40 ) for all < m < (k - 1)N , (52) 
where F n is defined for n e Zn by the formula 

F n ME(E(c nJlj \B")g hj \l < j < D). (53) 
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We may then apply Proposition 9.1 (with B replaced by B") to estimate the 
quantity (48) as 

k, 

(48) > cE(P( f| ^a|Zat)|1 < A < N /K) (54) 

m— 1 

where fc* = 0(1) and E^\ m = E^x m (k, 5, fc*, B") was defined in that Proposi- 
tion; recall that we are suppressing all dependence on the quantities d, k, S, 
M. 

Our attention thus turns to obtaining lower bounds for the size of E^\ . We 
first use (52) to pass from F^ m back to T^ m fuAP (modulo errors that can be 
made small by making D large) . From (52) and Cauchy-Schwarz we have 

E(E(|7^ m fuAP-F^ m \\B")\Z N ) = EflT^/iMP-iWlZjv) < 0{N« 40 ) for all < m < (k-l)N , 
so by Markov's inequality 

P(E(\T» m f UAP - F, m \\B") > ^IZiv) < O(iV - 40 ) for all < m < (k - l)N Q . 

In particular, from (38) and the triangle inequality we see (since = 0(1)) 
that 

k, k, 

P( f) E^ m \Z N ) > P( f| E^ Xm \Z N ) - O(iV - 30 ), (55) 

m— 1 m— 1 

where 

E' n := {x e Zjv : E(T n f u ± \B")(x) > 5/2 and 
F,(\T n fu± -T n f UAP \\B")(x) < 

The next step is to pull the shifts T n out of the B" expectations. To do this we 
use the following observation. 

Lemma 10.3 (Effective shift invariance of B"). Suppose that —(k — l)No < 
m < (k — l)N n is such that 



\E(T» m f u± \B")-T» m -E(f u± \B")\\ L 2 > N 



100 



or 

MTnfu^ - fuAp\ 2 \B") - T" m E(|/^ - f UAP \ 2 \B")\\ L , > N- im . 

Then we are in the energy increment half of the dichotomy of Proposition 10.1. 

Proof. We prove the first claim, as the second is analogous. Observe that 
E(Tt* m f u ± \B") = T^ m 'E(f u ±\T-^ m B"), and so 



\Wu-\T-» m B") -E(A,x|B")IU' > N 



100 
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By the triangle inequality, we thus have either 

mfuAT-^B") -E(/^|B')II^ > ^ 10 ° 

or 

mfuAB")-nfuAB')\\^>\N Wf) . 

But in either case we can use (31) (observing that B" and T~^ m B" are both 
finer than B' , by construction of B") to obtain an energy increment (45), as 
desired. Similarly for the second claim (which uses the second component of 
/ = ifu 1 - . Ifu 1 - - fuAp\, l/t/J- - fuAp\ 2 ) rather than the first). □ 

In light of this lemma, we may assume that 

\\V(T\f u ^\B")-T n E(f u ^\B")\\ L2 A\nT , Vu^ 
for all n e Z jv . In particular wc have 

P(\E(T» m f u± \B"){x)-T» m -E(f u ±\B"){x)\ > 5/4) < 0(N^ wa ) 

and 

PdEdT^/yx-T^WHS'OW-T^Ed/^-WIIB")^)! > 3^) < 0(^ 5 °) 

for all < m < (k — l)N . This allows us to estimate (since A;* = O(l)) 

k, k, 
P( f| E'^JZ N ) > P( f) E'* Xm \Z N ) O(N -™), (56) 

m— 1 m— 1 

where 

K := {i£Z„: T"E(/^|S")(z) > 35/4 and T n K{\fu±- f UAP \\B"){x) < ^}. 
Observe that = Combining this with (56), (55), (54) wc obtain 

k, 

(48) > cE( J] T^ Am l^(^)k G Ziv; 1 < A < N /K) - 0(N^ a ). (57) 

m— 1 

The function 1 £ » is measurable in B", which is a compact a- algebra of order 
d—1. At this point it it is tempting to apply the induction hypothesis (Theorem 
3.3 for d — 1) to 1e" (using Proposition 6.6) to obtain lower bounds for the 
right-hand side of (57). Unfortunately the problem is that the complexity of 
B" depends on X', whereas the range N /k* of the variable A is only allowed to 
depend on X, and so we cannot ensure that this expectation is even positive. 
To resolve this we must descend from the set Eq e B" to the slightly modified 
set E'd HE, where 

E := mfuAB) > 75/8 and E(\f u± - f UAP \\B) < 
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Lemma 10.4. Either we have 

P(E\E") < 0(t 2 ): P(E' ' nE)> 5/32 

or we are in the energy increment half of the dichotomy. 

Proof. We may assume without loss of generality that 

£ f (B")-£ f (B')<r 2 

since otherwise we would be in the energy increment half of the dichotomy. 
From (32) we thus have 

£ f (B")-£ f (B)<2r 2 , 
which implies from (31) and definition of / that 

E(|E(/^|B")-E(/^|B')| 2 |Ziv)<2r 2 

and 

E(|E(|A,x - f UA p\\B") - E(|A,x - f UAP \\B')\ 2 \Z N ) < 2t 2 
In particular by Chebyshev's inequality we have 

P(|E(A,x|B") -V{fuAB')\ > I) < 0(t 2 ) 

and 

P(\n\fu- - fuAp\\&')--E(\f u ± - fuAp\\B')\ > g^|Zjv) < 0(r 2 ) 

and the first claim follows from the definitions of E' ' and E. 
Now we prove the second claim. From (5) we have 

E(E{f u ±\B)\Z N ) = E{f u ±\Z N )>S 

and hence (by the boundedness of E(/ [/ x \B")) 

P(E(/ [/ x|B) > 75/S\Z N ) > 5/8 

while from (4) and Cauchy-Schwarz we have 

E(H(\f u ±-f UA p\\B")\Z N ) = -E(\f u ±-f UAP \\Z N ) < \\fu±-fuAp\\L* < 



1024fc 

and hence by Chebyshev's inequality 

P(E(|/ c/ x - fuAp\\B") > g^|Zjv) < 5/16. 

By definition of Eq , we thus have P(E' Q ' |Zjv) > 5/16, and the second claim of 
the lemma thus follows from the first if r is chosen sufficiently small. □ 
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We may of course assume that we are not in the energy increment half of 
the dichotomy, in which case Lemma 10.4 implies that 

Ue^e-^eWl' =0(t). 

Now observe that E e B and B is compact of order d — 1 and complexity at 
most X. Thus by Proposition 6.6 we can find a bounded nonnegative function 
fuAP e UAP 4 - 1 with 

- Iuap\\l 2 = 0(t) 

and 

WIuapWuap*- 1 < Ox,t(1)- 
In view of Lemma 10.4, we can thus apply the induction hypothesis of Theorem 
10.4 with f v ± replaced by l E » nE , provided that r is chosen sufficiently small. 
We conclude that 

fc„ 

E( J] T^ m l Ea {x)\x e Z N ; 1 < A < N /k*) > c(r,X) 

m— 1 

and so if we choose N sufficiently large depending on r, X, we see from (57) 
that 

(48) >c(t,X). 

The claim (44) now follows from (47), and the proof of the Proposition is com- 
plete. □ 

The proof of Szcmeredi's theorem is now complete. 



11 Appendix: Proof of van der Waerden's the- 
orem 

In this section we present the standard "colour focusing" proof of van der Waer- 
den's theorem (Theorem 1.1). Our proof presents no new ideas; we give it here 
only for the sake of sclf-containedness, and also to emphasize that this theorem is 
comparatively simple compared to Szemeredi's theorem, and thus any argument 
which manages to reduce the latter to the former is a non-trivial argument. 

The proof of Theorem 1.1 rests on the concept of a polychromatic fan, which 
we now define. We use [a,r,k] to denote the arithmetic progression a,a + 
r, . . . , a + (k — l)r. 

Definition 11.1. Let c : {1, . . . , N} — ► {1, . . . , to} be a colouring, let k > 1, 
d > 0, and a G {1, . . . , N}. We define a fan of radius k, degree d, and base point 
a to be a <i-tuple ([a, r±, k], . . . , [a, r<j, k]) of progressions in {1, ... , TV} of length 
k and base point a, and refer to the progressions [a + rj, r,, k — 1], 1 < i < d as 
the spokes of the fan. We say that a fan is polychromatic if its base point and 
its d spokes are all monochromatic with distinct colours. In other words, there 
exist distinct colours Co, ci, . . . , a G X such that c(a) = c , and c(a + jr{) = Ci 
for all 1 < i < d and 1 < j < k. 
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Proof of Van der Waerden's theorem. We induct on k. The base case k = 1 
is trivial, so suppose k > 2 and the claim has already been proven for k — 1; 
thus for every to there exists a positive integer A v dw(& — 1,to) such that any 
m-colouring of {1, . . . , A vc iw(^ — 1, m )} contains a monochromatic progression 
of length k — 1. 

We now claim inductively that for all d > there exists a positive integer 
Afan (fc — 1, to, d) such that any m-colouring of {1, ... , Afan(& — l,m,d)} con- 
tains either a monochromatic progression of length k, or a polychromatic fan 
of radius k and degree d. The base case d — is trivial; as soon as we prove 
the claim for d = to we are done, as it is impossible in an m-colouring for a 
polychromatic fan to have degree larger than or equal to m. 

Assume now that d > 1 and the claim has already been proven for d— 1. We 
define A = Afan(& — l,m,d) by the formula A := 4fcAiA2, where N\ := 
Afan {k — 1,to, d — 1) and A 2 := N v aw(k — l,m d Nf), which are guaran- 
teed to be finite by the inductive hypotheses, and let c be an m-colouring of 
{1, . . . , A}. Then for any b G {1, ... , A 2 }, the set {fefcAi + 1, . . . , bkN x + Ai} 
is a subset of {1, . . . , A} of cardinality N\. Applying the inductive hypothe- 
sis, we see that {bkNi + 1, . . . , bkNi + Ai} contains either a monochromatic 
progression of length k, or a polychromatic fan of radius k and degree d — 1. 
If there is at least one b in which the former case applies, we are done, so 
suppose that the latter case applies for every b. This implies that for every 
b G {1, . . . , A2} there exist a(b),ri(b), . . . ,rd-i(b) G {1, . . . ,N\} and distinct 
colours Co (6), • • • , Cd-i(b) G {1, . . . , to} such that c(bkN\ + a(b)) = c (b) and 
c(bkNx + a(b) + jn(b)) = a(b) for all 1 < j < k - 1 and 1 < i < d - 1. In par- 
ticular the map b t— ► (a(b), ri(6), . . . , rd_i(6), co(6), . . . , Cd_i(6)) is a colouring of 
{1, . . . , A2} by m d Ni colours (which we may enumerate as {1, . . . , m d Nf} in 
some arbitrary fashion) . Thus by definition of A 2 there exists a monochromatic 
arithmetic progression [b,s,k — 1] of length k — 1 in {1, . . . , A^}, with some 
colour (o, ri, . . . , rj-i, cq, . . . , Cd-i)- We may assume without loss of generality 
that s is negative since we can simply reverse the progression if s is positive. 

Now we use an algebraic trick (similar to Cantor's famous diagonalization 
trick) which will convert a progression of identical fans into a new fan of one 
higher degree, the base points of the original fans being used to form the ad- 
ditional spoke of the new fan. Introduce the base point 60 : = {b — s)kN\ + a, 
which lies in {1, . . . , A} by construction of A, and consider the fan 

([6 , skNx,k], [6 , skNi + n, k], . . . , [b , skNi + r d -i, fc]) 

of radius k, degree d, and base point bo . We observe that all the spokes of this 
fan are monochromatic. For the first spoke this is because 

c(6 +jsfcAi) = c((b+(j-l)s)kN 1 +a) = c (6+(j-l)s) = c for all 1 < j < fc-1 

and for the remaining spokes this is because 

c(6o+i(sfcA!+r t )) = c{{b+(j-l)s)kN 1 +a+jr t ) = c t {b+(j-l)s) = c t for all 1 < j < fc-1, 1 < t < d-1. 
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If the base point 60 has the same colour as one of the spokes, then we have 
found a monochromatic progression of length k; if the base point 60 has distinct 
colour to all of the spokes, we have found a polychromatic fan of radius k and 
degree d. In either case we have verified the inductive claim, and the proof of 
Proposition 1.1 is complete. □ 

Remark 11.2. The bounds on N v dw{k,m) obtained by this method are clearly 
of Ackermann type, and are extremely far from best possible. The first primitive 
recursive bound on N v< iw(k, m) is due to Shelah [33]. Currently the best known 
bound for N vdw (k,m) is 

N v dw{k,m) < 2 , where := 2 , 

due to Gowers [19]. In contrast to arguments such as the one presented here, 
in which one deduces Szemeredi-type theorems from van der Waerden type 
theorems, the bound obtained by Gowers is in fact derived by the converse 
procedure, in which one first proves Szemeredi's theorem (without recourse to 
the van der Waerden theorem) and then deduces the above bound on van der 
Waerden's theorem as a consequence. 

Remark 11.3. One can modify the above colour focusing technique to prove 
polynomial or Hales-Jewett versions of van der Waerden's theorem, see for in- 
stance [42]. 
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